Method and system for ontology driven data collection and processing

ABSTRACT

Systems and method to aid in the collection, representation and mining of data are disclosed. More particularly, embodiments as disclosed may utilize a unifying format to represent data obtained or utilized by a system to facilitate linking between data from different sources and the commensurate ability to mine such data. Specifically, embodiments may represent data as graphs that comprise the concepts and relationships between those concepts. In this manner, concepts in graphs that represent distinct groupings of data may be mapped and knowledge mining with respect to these graphs facilitated.

RELATED INFORMATION

This application is a continuation of, and claims a benefit of priorityunder 35 U.S.C. 120 of the filing date of U.S. patent application Ser.No. 12/928,463 entitled “METHOD AND SYSTEM FOR ONTOLOGY DRIVEN DATACOLLECTION AND PROCESSING” filed on Dec. 13, 2010 by inventor ParsaMirhaji, which in turn claims a benefit of priority to the filing dateof U.S. Provisional Patent Application Ser. No. 61/284,332 entitled“METHOD AND SYSTEM FOR TEXT UNDERSTANDING,” filed on Dec. 16, 2009 byinventor Parsa Mirhaji; U.S. Provisional Patent Application Ser. No.61/284,331 entitled “METHOD AND SYSTEM FOR A SEMANTIC REPRESENTATION OFUNIFIED MEDICAL LANGUAGE SYSTEM (UMLS) USING SIMPLE KNOWLEDGEORGANIZATION SYSTEM (SKOS),” filed on Dec. 16, 2009 by inventor ParsaMirhaji; U.S. Provisional Patent Application Ser. No. 61/284,330entitled “METHOD AND SYSTEM FOR ONTOLOGY DRIVEN DATA COLLECTION,” filedon Dec. 16, 2009 by inventor Parsa Mirhaji, the entire contents of whichare hereby expressly incorporated by reference for all purposes.

GOVERNMENT LICENSE RIGHTS

This invention was made with government support under W81XWH-04-2-0035awarded by The U.S. Army Medical Research Acquisition Activity. Thegovernment has certain rights in the invention.

TECHNICAL FIELD

This disclosure relates generally to the field of informatics systems.In particular, this disclosure relates to the collection, integrationand contextualization of information. More specifically, this disclosurerelates to the collection of data using structured data entry in adistributed environment and processing of structured data to identifyconcepts and relationships according to ontologies.

BACKGROUND

With the increasing prevalence and use of computing systems the amountof data that can be obtained regarding various problem spaces has grownexponentially. While the amount of data that may be obtained withrespect to a particular space may have increased significantly, theintegration of heterogeneous data from multiple sources, the sharing ofinformation in a distributed and collaborative environment and themining of such data are challenging informatics problems. Nowhere arethese types of challenges and problems more evident than in the case ofa natural disaster or epidemic as the understanding, diagnoses,treatment and prevention of human diseases requires the collection,integration and understanding of information and knowledge from a widevariety of highly distributed sources which may present a uniquechallenge in such circumstances. This problem is exacerbated becausemost clinical research environments lack proper informatics resourcesand infrastructure to assist with preparation, implementation andmaintenance of data collection and management platforms that canconsistently and concurrently support collection, integration andcontextualization of multiple research projects across manyparticipating sites.

It is thus desired to provide advanced informatics platforms to enablecomplete, reliable and fast collection and validation of informationthroughout various research projects, and among different participatinglocations. Moreover, in conjunction with the collection of data for suchsystems it may be desired to process natural language (sometimesreferred to as free text). This desire is particularly strong in thefield of medicine, as free text entries in the form of dischargediagnosis, chief complaint, nurse and practitioner note, diagnosticreports and consultations, etc. are extremely important part of apatient electronic health record and frequently unavailable for decisionsupport and research queries due to its unstructured and unconstrainedformat. While human experts can effortlessly understand the meaning ofthe text, its implications in multiple different contexts (decisionsupport, research, quality of care, etc.), or answer questions regardingpatient health status, current computational processes are not able toprocess such health related free text to produce structured data thatallows data mining of such free text.

SUMMARY

Systems and method to aid in the collection, representation and miningof data are disclosed. More particularly, embodiments as disclosed mayutilize a unifying format to represent data obtained or utilized by asystem to facilitate linking between data from different sources and thecommensurate ability to mine such data. Specifically, embodiments ofinformatics systems may represent data as graphs that comprise theconcepts and relationships between those concepts. In this manner,concepts in graphs that represent distinct groupings of data may bemapped to each other and to other information and knowledge mining withrespect to these graphs facilitated. By representing data in graphs, itmay be possible to automate many process that are involved in theintegration and interpretation of multiple heterogeneous data sourcesand the utilization of computer based algorithms to mine such data, evenwhen such data does not conform to standardized representation.

Embodiments of such informatics system may utilize ontologies (alsoreferred to as knowledgebases or models) to facilitate elements of theiroperation. Certain ontologies may be used to support the creation anddistribution of data collection instruments and to contextualize thedata returned according to the ontology. Ontologies may be also beutilized to analyze data in a textual format such that the data may becontextualized according to the ontology. Other ontologies may be usedto describe the format of data that may be received from one or moredata sources such that obtained data may be contextualized according tothat ontology when it is received from the corresponding data source. Inthis manner, obtained data may be represented in a graph according to anontology.

To further contextualize obtained data, ontologies that representcollections of knowledge may be utilized. More specifically, ontologiesthat represent knowledge associated with a certain domain may berepresented as a graph. Concepts in the graph representing obtained datamay be mapped to the concepts of one or more ontologies representingdomain knowledge. In this manner, obtained data may be placed in thecontext of a particular domain by unifying the graph representingobtained data and the graph representing the ontology for a particulardomain.

These unified graphs then may be utilized to mine the obtained data. Inparticular, the unified graph may be queried or otherwise navigatedbased on the concepts or relationships in the domain ontology or one ofthe other ontologies to which the graph of the obtained data is mapped.

Embodiments of such systems and methods may be referred to as survey ondemand systems, or SODS. While there are several survey design tools inthe market they mainly provide assistance in design and publication ofsurveys for online (web based) data entry and do not provide adequatemethods of processing or understanding the semantics or, orrelationships between, such data. Examples of such tools can be found inMicrosoft InfoPath, FrontPage etc., each of which also enable creationof a database backend to collect the data in a systematic way and into adatabase.

One embodiment of a SODS is a comprehensive survey design anddistributed information collection and integration platform. It canproactively capture ad-hoc data from multiple sources and transfer itthrough secure, private data links to a central repository. The data canbe transformed into a semantic representation, mapped to ontologies andintegrated into a core integrative platform that enables informationprocessing and data mining. More particularly, one embodiment of SODSmay tailored to adapt to unprecedented events such as disasters orepidemics or to deploy to remote locations by allowing ad-hoc datacollection and just-in-time information acquisition using multiplicityof platforms from web based and PC based environments to PDAs that maybe occasionally connected to a collaborative network or Internet.

Specifically, in one embodiment, an online and web based questionnairemay be designed and implemented using such a system. The surveysdesigned by this system are automatically deployed online to a Webportal, to small screen devices such as handhelds as well as tabletPC's, laptop computers, and PCs, etc. The information collected by allthese platforms synchs back and integrates with the system such thatdata collected from all platforms and all surveys can be queried andinterpreted collectively, even if the questionnaires and surveys havebeen deployed in different times and for different purposes.

Other embodiments may utilize a semantic representation of survey datafor exchange and sharing of information online, controlled vocabularyand ontologies (for example, formal knowledge models) to enable andassist construction of surveys across project and the ability to usevocabularies and taxonomies (including medical vocabularies such asSNOMEDCT) as part of the domain knowledge to construct surveys.

In one embodiment, the ability to construct a survey based on a surveyontology may be provided, including the ability to add concepts to thesurvey ontology, wherein the added concepts are mapped to the domainontology asynchronously or automatically. The survey may be a graphrepresentation of a set of questions mapped to the survey ontology andthe survey response may be a graph representation of responses to thequestions of the survey such that when a survey response is mapped tothe survey a unified graph of the survey, survey response and the domainontology is created.

Embodiments of such systems and methods may provide the advantages ofdeploying surveys in multiple platforms, including Web Based Forms(including, for example, iPhone or Android phones) for Data Entry; PDABased Application for Data Entry; PC (Windows) based Application forData Entry; etc.

In one embodiment, an informatics system may utilize a substantiallyautomated method of creating a unified graph based on a structureddataset (which may for example, be received from a data source), such asan XML document formed as an XML message or the like, or a data formedaccording to a database schema employed by a data source. Specifically,in one embodiment, the structured dataset may be received and anontology that describes the structure or types of data from the datasource may be constructed. A graph representing the actual data of thedata set may then be constructed based on the ontology describing thestructured data to create a unified graph comprising the ontology andthe graph representation of the data of the dataset. This unified graphmay then be used for a variety of purposes. For example, in oneembodiment, concepts in the ontology may be mapped to a domain ontologyor the like such that a unified graph can be created from the ontologyrepresenting the source, the graph representing the data of thestructured data and the domain ontology. Such a unified graph can thenbe searched according to the concepts and relationships of the domainontology.

Embodiments may also create a unified central repository that integratesdata from multiple forms and surveys into one single unit of analysisand retrieval and provide built in authentication, security and auditingto control access and retrieval of information based on users roles.Embodiments may also support of an occasionally connected mode (seamlessoperation regardless of internet connectivity and synchs back to thedatabase automatically when the connection is established) and automatedupdates of the latest changes to the survey at connection time (if morequestions are added, or existing ones are modified or deleted, thesurvey responders will automatically see the latest versions on the flyand immediately after it is submitted by form designer).

Embodiments presented herein may enable complete, reliable and fastcollection and integration of heterogeneous information. Morespecifically, embodiments of an informatics platform where collecteddata can be normalized, integrated and mapped to vocabulary systems,such as medical vocabulary systems. Any change in the original contextor structure of the data collection instruments can be incorporatedthroughout the whole system and integrated data may need to be stored ina format that can be repurposed to support data mining without losing ordistorting the semantics or context of the original data.

Embodiments as disclosed may comprise a system for ontology driven datamining, comprising an informatics system coupled to a plurality of datasources, wherein the informatics system can receive an input from one ormore of the plurality of data sources, create a graph representation ofthe input, obtain a graph representation of an ontology, wherein theontology comprises a set of concepts and a set of relationships, map thegraph representation of the input to the graph representation of theontology to create a unified graph comprising the graph representationof the input and the graph of the ontology. The ability to construct aquery based on at least one of the set of concepts or at least one ofthe set of relationships of the ontology may also be provided such thatthe unified graph may be searched based on the query to obtain data ofthe input associated with at least one concept or the at least onerelationship.

These, and other, aspects of the invention will be better appreciatedand understood when considered in conjunction with the followingdescription and the accompanying drawings. The following description,while indicating various embodiments of the invention and numerousspecific details thereof, is given by way of illustration and not oflimitation. Many substitutions, modifications, additions orrearrangements may be made within the scope of the invention, and theinvention includes all such substitutions, modifications, additions orrearrangements.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings accompanying and forming part of this specification areincluded to depict certain aspects of the invention. A clearerimpression of the invention, and of the components and operation ofsystems provided with the invention, will become more readily apparentby referring to the exemplary, and therefore nonlimiting, embodimentsillustrated in the drawings, wherein identical reference numeralsdesignate the same components. Note that the features illustrated in thedrawings are not necessarily drawn to scale.

FIG. 1 depicts one embodiment of a method by which informatics systemsmay operate.

FIG. 2 depicts one embodiment of an informatics system integrated into atopology of a medical environment.

FIG. 3 depicts a portion of a survey ontology.

FIG. 4 depicts one embodiment of a method to gather and mine data basedon a survey ontology.

FIG. 5 depicts one embodiment of the composition of a form as a graphrepresentation in the ontology.

FIG. 6 depicts one embodiment of the definition of an enumeratedquestion in conjunction with a survey in the ontology.

FIG. 7 depicts one embodiment of the linking of an enumerated questionto concepts that define valid value sets (response options) for thequestion according to the ontology.

FIG. 8 depicts one embodiment of the mapping between value sets and anontology to enable contextualization of the responses according to anexternal source of knowledge. Each value can be mapped to a set ofconcepts from a set of ontologies ad-hoc, for further contextualization.

FIG. 9 depicts one embodiment of a concept assigned to an enumeratedquestion that can mapped to a set of domain ontologies for furthercontextualization.

FIG. 10 depicts one embodiment of the definition of a question as agraph representation in an ontology.

FIG. 11 depicts one embodiment of the mapping of a question to a conceptthat controls the graphical user interface representation of thatconcept in the client application.

FIG. 12 depicts one embodiment of the configuration of a user interfacestyle concept within the ontology.

FIG. 13 depicts a portion of a survey ontology represented as graph.

FIG. 14 depicts one embodiment of a portion of a survey ontologyrepresented as graph.

FIG. 15 depicts one embodiment of graph that represents the responseconcepts.

FIGS. 16A-B depict one embodiment of graph that represents the responseconcepts when new questions and responses are needed to be recorded inbased on one of the previous responses.

FIG. 17 depicts relationships inside a survey ontology that automatesdesign and construction of conventional relational databases out of thegraph representation. Ontological representation of these relationshipsbetween domain concepts, questions, their datatypes, responses andrelationships facilitates computer code to automatically generaterelational database schema that best represents the underlyingontological representation of surveys and their responses. If the surveystructure changes by human interaction, the nature of theserelationships will change and as a result a new database schema may begenerated to account for the change in the design of the surveys.

FIG. 18 depicts one embodiments of a method for the construction andpopulation of a relational database schema based on the relationshipsdepicted in FIG. 17.

FIGS. 19A-D depict a listing of a relational database schema generatedby one embodiment of the system.

FIGS. 20A-B depict one embodiment of an interface generated by theclient application.

FIG. 21 depicts one embodiment of a question response along withrecording of the change and update history for any given response as agraph representation. This graph maps and integrates with the rest ofthe survey response graph, survey ontology and domain knowledge as aunified whole.

FIG. 22 depicts one embodiment of a survey response.

FIGS. 23A-C depict one embodiment of a survey response inside ontologyand mapped to survey ontology and domain concepts.

FIG. 24 depicts one embodiment of a method to process text.

FIG. 25 depicts one embodiment of concepts defined in a syntax ontology.

FIG. 26 depicts one embodiment of a class definition to define negationsyntactically.

FIG. 27 depicts one embodiment of a portion of the UMLS-SKOS domainontology.

FIG. 28 depicts one embodiment of a biomedical concept in the UMLS-SKOSdomain ontology.

FIG. 29 depicts one embodiment of the expression of logical constraintsin domain ontology.

FIG. 30 depicts one embodiment of a portion of a semantic ontology.

FIG. 31 depicts one embodiment of a parse graph.

FIG. 32 depicts one embodiment of the output of a syntactic parser.

FIG. 33 depicts one embodiment of a unified graph as a result of mappinga parse graph to domain ontology and semantic ontology.

FIG. 34 depicts one embodiment of a conceptual graph.

FIG. 35 depicts one embodiment of formal RDF output of the textprocessing algorithm. The input text turns into a formal graphrepresentation with all mapping needed to facilitate its integration andautomated interpretation, navigation, search and retrieval.

FIG. 36 depicts one embodiment of a method for constructing an ontologyfor UMLS.

FIG. 37 depicts one embodiment of an ontology representing UMLS SemanticNetwork.

FIG. 38 depicts one embodiment of an example SAB class

FIG. 39 depicts one embodiment of properties.

FIG. 40 depicts on embodiment of classes representing labels and termsin UMLS-SKOS ontology.

FIG. 41 depicts one embodiment of a CUI.

FIG. 42 depicts one embodiment of a concept and its SKOS relationshipswith other concepts.

FIG. 43 depicts one embodiment of a representation of a concept from aSABs and its relations to other concepts from the same SAB or otherSABs.

FIG. 44 depicts one embodiment of the mapping between CUI and conceptsfrom different SABs.

FIGS. 45A-B depict one embodiment of a portion of the UMLS-SKOS ontologyencompassing UMLS Semantic Network, UMLS-MTH, and SABs all mappedtogether as a unified whole and represented as a graph.

FIG. 46 depicts one embodiment of a method for creating an ontologyrepresenting a data source based on structured data.

FIG. 47 depicts one embodiment of a method of creating an ontologyrepresentation of a data source and representing data from a data sourceaccording to the created ontology.

FIGS. 48A-B depict one embodiment of a method for an XML schema parser.

FIG. 49 depicts one embodiment of a method for an XML to RDF mapping.

FIGS. 50A-B depict one embodiment of a method for creating an ontologyfor a data source.

FIGS. 51A-B depict one embodiment of a method for representing dataaccording to a source ontology.

FIG. 52 depicts one embodiment of a portion of a datatype model.

FIG. 53 depicts one embodiment of a portion of a core schema ontology.

FIG. 54 depicts one embodiment of an example source specific populationof an XML model.

FIG. 55 depicts one embodiment of an ontology that is used to extend theTBOX.

FIG. 56 depicts a snapshot of a one embodiment of a TBOX extracted froma graph.

FIG. 57 depicts one embodiment of an portion of an ABOX.

FIG. 58 depicts one embodiment of a converted XML message.

DETAILED DESCRIPTION

The invention and the various features and advantageous details thereofare explained more fully with reference to the nonlimiting embodimentsthat are illustrated in the accompanying drawings and detailed in thefollowing description. Descriptions of well-known starting materials,processing techniques, components and equipment are omitted so as not tounnecessarily obscure the invention in detail. It should be understood,however, that the detailed description and the specific examples, whileindicating preferred embodiments of the invention, are given by way ofillustration only and not by way of limitation. Various substitutions,modifications, additions and/or rearrangements within the spirit and/orscope of the underlying inventive concept will become apparent to thoseskilled in the art from this disclosure. Embodiments discussed hereincan be implemented in suitable computer-executable instructions that mayreside on a computer readable medium (for example, a HD), hardwarecircuitry or the like, or any combination.

As used herein, the terms “comprises,” “comprising,” “includes,”“including,” “has,” “having” or any other variation thereof, areintended to cover a non-exclusive inclusion. For example, a process,article, or apparatus that comprises a list of elements is notnecessarily limited only to those elements but may include otherelements not expressly listed or inherent to such a process, article, orapparatus. Further, unless expressly stated to the contrary, “or” refersto an inclusive or and not to an exclusive or. For example, a conditionA or B is satisfied by any one of the following: A is true (or present)and B is false (or not present), A is false (or not present) and B istrue (or present), and both A and B are true (or present).

Additionally, any examples or illustrations given herein are not to beregarded in any way as restrictions on, limits to, or expressdefinitions of, any term or terms with which they are utilized. Instead,these examples or illustrations are to be regarded as being describedwith respect to one particular embodiment and as illustrative only.Those of ordinary skill in the art will appreciate that any term orterms with which these examples or illustrations are utilized willencompass other embodiments which may or may not be given therewith orelsewhere in the specification and all such embodiments are intended tobe included within the scope of that term or terms. Language designatingsuch nonlimiting examples and illustrations includes, but is not limitedto: “for example,” “for instance,” “e.g.,” “in one embodiment”.

Before discussing specific embodiments, an embodiment of an architecturefor implementing certain embodiments is described herein. One embodimentcan include one or more computers communicatively coupled to a network.As is known to those skilled in the art, the computer can include acentral processing unit (“CPU”), at least one read-only memory (“ROM”),at least one random access memory (“RAM”), at least one hard drive(“HD”), and one or more input/output (“I/O”) device(s). The I/O devicescan include a keyboard, monitor, printer, electronic pointing device(such as a mouse, trackball, stylus, touchscreen, etc.), microphone,camera or the like. In various embodiments, the computer may have accessto at least one database over the network.

ROM, RAM, and HD are computer memories for storing computer-executableinstructions executable by the CPU. Within this disclosure, the term“computer-readable medium” is not limited to ROM, RAM, and HD and caninclude any type of data storage medium that can be read by a processor.In some embodiments, a computer-readable medium may refer to a datacartridge, a data backup magnetic tape, a floppy diskette, a flashmemory drive, an optical data storage drive, a CD-ROM, ROM, RAM, HD, orthe like. ROM, RAM, and HD are computer memories for storingcomputer-executable instructions executable by the CPU. Within thisdisclosure, the term “computer-readable medium” is not limited to ROM,RAM, and HD and can include any type of data storage medium that can beread by a processor. In some embodiments, a computer-readable medium mayrefer to a data cartridge, a data backup magnetic tape, a floppydiskette, a flash memory drive, an optical data storage drive, a CD-ROM,ROM, RAM, HD, or the like.

At least portions of the functionalities or processes described hereincan be implemented in suitable computer-executable instructions. Thecomputer-executable instructions may be stored as software codecomponents or modules on one or more computer readable media (such asnon-volatile memories, volatile memories, DASD arrays, magnetic tapes,floppy diskettes, hard drives, optical storage devices, etc. or anyother appropriate computer-readable medium or storage device). In oneembodiment, the computer-executable instructions may include lines ofcomplied C++, Java, HTML, or any other programming or scripting code. Atleast portions of the functionalities implemented herein may beimplemented in one or more modules. Each module may comprise one or morecomputer readable instructions configured to implement the functionalityof that module. Modules may be combined or further divided, reside orone or multiple computer readable mediums, and the modules depictedherein should not be taken as in any way limiting the configuration orimplementation of embodiments of the systems and methods depictedherein.

Additionally, the functions of the disclosed embodiments may beimplemented on one computer or shared/distributed among two or morecomputers in or across a network. Communications between computersimplementing embodiments can be accomplished using any electronic,optical, radio frequency signals, or other suitable methods and tools ofcommunication in compliance with known network protocols.

Additionally, the functions of the disclosed embodiments may beimplemented on one computer or shared/distributed among two or morecomputers in or across a network. Communications between computersimplementing embodiments can be accomplished using any electronic,optical, radio frequency signals, or other suitable methods and tools ofcommunication in compliance with known network protocols.

A brief discussion of context, particularly with respect to datagathering systems may now be helpful. Integrating heterogeneous datafrom multiple sources and sharing information in a distributed andcollaborative environment are challenging informatics problems. Thesechallenges are particularly evident in a medical setting or in the caseof a natural disaster or epidemic as understanding, diagnosing, treatingand preventing human diseases requires the collection, integration andunderstanding of information and knowledge from a wide variety of highlydistributed sources which may present a unique challenge in suchcircumstances.

To aid in the processing and understanding of such data it may bedesired to provide an informatics system to aid in the collection,representation and mining of such data. Accordingly, attention is nowdirected to embodiments of methods and systems for such informaticssystems. Such informatics systems may utilize a unifying format torepresent data obtained or utilized by the system to facilitate linkingbetween data from different sources and the commensurate ability to minesuch data. In particular, embodiments of these types of informaticssystems may represent data as graphs that comprise the concepts andrelationships (also referred to as mapping or links) between thoseconcepts. These graphs may be formal (computer interpretable) graphsthat can be stored in a data store in a variety formats. Graphs may berepresented using the resource definition framework (RDF) from SemanticWeb. The RDF is described in detail in the World Wide Web Consortium(W3C) recommendations and specifications, incorporated herein byreference in their entirety. In this manner, concepts in graphs thatrepresent distinct groupings of data may be mapped and knowledge miningwith respect to these graphs facilitated. By representing data in formalgraphs, it may be possible to automate many process that are involved inthe integration and interpretation of multiple heterogeneous datasources and the utilization of computer based algorithms to mine suchdata, even when such data does not conform to standardizedrepresentation.

Specifically, embodiments of such informatics system may utilizeontologies (also referred to as knowledge bases) to facilitate elementsof their operation. Embodiments of these ontologies may be graphsrepresented in Web Ontology Language (OWL) (OWL is a family of knowledgerepresentation languages for authoring ontologies). The language may becharacterized by formal semantics and RDF/XML-based serializations forSemantic Web. OWL is endorsed and described by the World Wide WebConsortium (W3C). Semantic Web provides one language for creatingontologies that are computer understandable and available over anetwork, other ways will be possible.

Certain ontologies may be used to support the creation and distributionof data collection instruments and to contextualize the data returnedaccording to the ontology. Ontologies may be also be utilized to analyzedata in a textual format such that the data may be contextualizedaccording to the ontology. Other ontologies may be used to describe theformat of data that may be received from one or more data sources suchthat obtained data may be contextualized according to that ontology whenit is received from the corresponding data source. In this manner,obtained data may be represented in a graph according to an ontology.

To further contextualize obtained data, ontologies that representcollections of knowledge may be utilized. More specifically, ontologiesthat represent knowledge associated with a certain domain may berepresented as a graph. Concepts in the graph representing obtained datamay be mapped to the concepts of one or more ontologies representingdomain knowledge. This mapping may be accomplished by establishing arelationship (such as “same as” relationship between the two concepts).In this manner, obtained data may be placed in the context of aparticular domain by unifying the graph representing obtained data andthe graph representing the ontology for a particular domain. As usedherein the term unified graph is intended to mean any graph formed bymapping (either directly by mapping one concept to another or indirectlyby mapping a concept to another concept that is in turn mapped to athird concept such that the original concept and the third concept aremapped) at least one concept in one graph with at least one concept inanother graph, or any graph resulting from the addition of a concept andrelationship to an existing graph (for example, by instantiating aconcept and linking the concept to another concept in an existinggraph).

These unified graphs then may then be utilized to mine the obtaineddata. In particular, the unified graph may be queried or otherwisenavigated based on the concepts or relationships in the domain ontologyor one of the other ontologies to which the graph of the obtained datais mapped. This mapping enables rules based and logic reasoning enginesto be used for classification and enables such graphs to be reused andrepurposed depending on the domain ontology to which the graphrepresenting the obtained data is mapped. This means the same graph canbe contextualized for a wide variety of uses, including for example,decision support, billing, research, case recruitment, quality of careassessment, etc. without need to reprocess obtained data.

Accordingly, a cross-platform informatics system that providesdistributed operation may be provided. Data may be obtained from avariety of source and represented in an extensible, context independent,format that supports multidisciplinary uses of the data using arepresentation framework that can be incrementally updated and extendedto adapt to new specializations, and enable integration of new data.Such a format may provide data with an independent life cycle that isauthenticated, and may be audited in a traceable and revertible fashionsuch that changes to the system objects or their associated values maybe traceable, or revertible back to its original status in a systematicway. Obtained data may be contextualized according to any one of anumber of ontologies. This contextualization allows a series ofoperations that can be automatically or interactively specialized andcustomized to meet specific requirements of individual projects relatedto a particular domain.

Reference is now made to FIG. 1, which presents one embodiment of amethod by which such informatics systems may operate. Althoughembodiments as described herein will be presented throughout withreference to an informatics system that may be used in a medicalenvironment, it should be noted that the systems and methods presentedherein will be equally applicable in other environments and the contextin which embodiments are described should in no way be taken aslimitations on the applicability of such systems and methods.

At step 10 an informatics system that operates according to anembodiment of the present invention may obtain data from a variety ofsources. At step 20 the obtained data may be represented as a graph andthe graph representing the obtained data mapped to one or moreontologies to contextualize the data according to the ontology. Onemethod of obtaining data for such an informatics system may involve theuse of surveys. An ontology may describe the structure of a datacollection instrument, including for example, projects, forms, surveys,order, group, attributes, etc. This type of ontology may be referred toherein as a survey ontology. Thus, a survey ontology may be a graphrepresentation of an ontology configured for the implementation ofsurveys.

Using such a survey ontology a user of an informatics system may bepresented with an interface which allows him to create one or moresurveys. As the survey is constructed based on a survey ontology thesurvey may itself be represented as a graph such that the graphrepresenting the survey and the survey ontology form a unified graph. Inparticular, a survey may be composed of a number of questions. Thesequestions may reference certain concepts, where the concepts may not yetbe represented in the graph representing the survey. In such cases aconcept may be created and linked to the graph comprising the survey. Inthis manner, the graph representing the survey and the survey ontologycan expand organically to encompass the concepts desired.

A user's device may communicate with the informatics system and obtainsuch a survey by obtaining the graph representation of the survey. Basedon the graph representation of the survey a user interface may berendered at the user's device to present the questions comprising thesurvey. The user may provide answers to these questions, where theseanswers are returned to the informatics system and represented as agraph such that the graph representing the user's answers to the surveyforms a unified graph with both the graph representing the survey andthe survey ontology.

Moreover, the concepts in the graph of the survey representing questionsof the survey may have been mapped to concepts in one or more ontologiesdescribing knowledge pertaining to a domain (referred to as a domainontology or knowledge base). Thus, the mapping of the graph representingthe user's answers to the survey to the graph representing the form mayalso serve to contextualize the answers by forming a unified graphbetween the graph representing the user's answers, the graphrepresentation of the survey, the survey ontology and the domainontology.

Data may also be obtained from text based sources. In a medicalenvironment these sources may comprise, for example, an electronicmedical records system (EMR), lab reports, medical charts, dischargediagnosis, chief complaint, nurse and practitioner notes, diagnosticreports and consultations, etc. This text may be input manually to theinformatics system or received electronically. The text may be parsedaccording to a graph representation of an ontology representingsyntactic knowledge (referred to a syntactic ontology), where thesyntactic ontology utilized may be selected based upon the expectedlanguage, format, type of text, environment to which the text maypertain, etc. The result of the parsing may be a graph representation ofthe concepts and relationships of the text. The graph representing thetext may thus form a unified graph with the syntax ontology.

This graph representation of the text may then be mapped to a domainontology to form a unified graph comprising the graph representing thetext, the syntax ontology and the domain ontology. Using the mappingsbetween the graph representing the text and the domain ontology, andpreviously established mappings between the domain ontology and asemantic ontology, the graph representing the text may be mapped to asemantic knowledge base. In this manner, a unified graph comprising thegraph representing the text, the domain ontology and the semanticontology can be formed. The semantic ontology may be a generic andextensible ontology that represents the concepts that are likely to befound in text of the type being processed. A semantic ontology may serveas a high level schemata (information model) with minimal set ofsemantic constrains that sufficiently represent major patternsidentifiable in typical text of the type being processed that enablesextensions and mappings to more specialized ontologies to specialize itto meet particular requirements of a new use case or domain.

Data can also be obtained from a variety of data sources directly. Datamay be received from these data sources, or an informatics system mayobtain data from these data sources in another manner. The data may beobtained using a structured representation of the data such as an XMLobject. As data sources may have different structures for representingtheir data the informatics system may have a set of source ontologies,where each of the set of source ontologies corresponds to a particulardata source or type of data source. When data is obtained from a datasource the informatics system may utilize an ontology that correspondsto the data source from which the data was obtained. Using the ontologythen, a graph of the obtained data may be created by processing thestructured representation according to the corresponding ontology torepresent the data from the source as a graph where this graph isunified with the ontology for source from which it was obtained. Thegraph of the obtained data can then be mapped to a domain ontology tocreate a unified graph comprising the graph of the obtained data and thedomain ontology.

Once data has been obtained, represented as a graph and a unified graphcomprising the obtained data and at least one ontology is formed tocontextualize the data, the unified graph may be data mined at step 30.More specifically, an interface may be provided to a user to query theunified graph. This interface may present to the user a list of conceptsor relationships utilized in the domain ontology or the semanticontology comprising the unified graph. The user can thus construct aquery utilizing the concepts or relationships of the ontology andobtained data searched and organized according to those concepts orrelationships.

The unified graph may be searched according to the query constructed bythe user utilizing SPARQL Protocol and RDF Query Language (SPARQL) whichwas standardized by the RDF Data Access Working Group of the W3C and isan official W3C recommendation. SPARQL allows for a query to comprisetriple patterns, conjunctions, disjunctions, patterns, etc. SPARQL alsoallows federated queries where the query is distributed to multiplelocations or computed distributed and results from the distributed querygathered.

Thus, the interface presented to the user may provide an open frameworkfor the user to construct queries according to the context of aparticular ontology. These queries can be translated into SPARQL and runagainst the unified graph comprising the ontology and data obtained fromusers to provide the user who initiated the query with data obtainedfrom users that is relevant to the query. In this manner users areprovided with a highly effective and contextual method for extractingmeaning from obtained data. Specifically, the interface may present theusers with the set of concepts or relationships utilized in the ontologyto allow the user to forms queries based on these concepts andrelationships. Searches can then be formed and conducted based on theontology used to contextualize the data.

As can be seen then, embodiments of such an informatics system mayprovide methods of gathering data from various sources which allow thedata to be contextualized according to a desired ontology and the dataqueried according to that ontology. By representing data in a graphform, the data may be recontextualized and queried according tosubstantially any desired ontology without either obtaining orreformatting such data again.

As has been noted above, embodiments of such informatics systems may beapplied to almost any desired context, where the selection of certainontologies to utilize in conjunction with any particular embodiment maybe dependent, at least in part, on the desired context. Thus, forexample, the syntactic ontology may be selected based on what type oftext is expected, the semantic and domain ontology may be selected inorder to contextualize obtained data according to a desired context suchthat the obtained data can be mined according to those ontologies.

Though embodiment of informatics systems may be useful in many contexts,certain embodiments may be particularly useful in the context of medicalenvironments and generally in the field of medicine. This is because inthe medical field free text entries in the form of discharge diagnosis,chief complaint, nurse and practitioner note, diagnostic reports andconsultations, etc. are extremely important part of a patient electronichealth record, are frequently unavailable for decision support andresearch queries due to its unstructured and unconstrained format. Whilehuman experts can effortlessly understand the meaning of the text, itsimplications in multiple different contexts (decision support, research,quality of care, etc.) or answer questions regarding patient healthstatus, current computational processes are not able to process suchhealth related free text to produce a structured data output from suchfree text to allow data mining of such free text such as questionanswering and information integration. Furthermore, in the case of anatural disaster or epidemic; understanding, diagnosing, treating andpreventing human diseases requires the collection, integration andunderstanding of information and knowledge from a wide variety of highlydistributed sources which may present a unique challenge in suchcircumstances. Accordingly in most medical environments it is desired tohave effective informatics systems.

Moving now to FIG. 2, one embodiment of an informatics system integratedinto a topology of a medical environment is depicted. Informatics system110 allows for obtaining data from various data sources 100,representing the obtained data as a graph, mapping the graph to one ormore ontologies, and the mining of the obtained data based on theontology to which it is mapped. These data sources 100 may comprisealmost any type of computing device from which it is desired to obtaindata, included database systems; user devices such as computers, mobilephones, personal data assistants; an electronic medical records (EMR)systems; etc. where the data sources 100 may be coupled to informaticssystem 110 through network 170. Network 170 may be almost any type ofwired or wireless communication medium, including for example, a LAN aWAN, an intranet, the Internet, etc. Informatics system 110 maycommunicate with data sources 100 over the network 170 utilizing aservice oriented architecture, for example, Web Services or the like.Such an architecture may create modularized and asynchronousconnectivity that allows any number of disparate data sources 100 tocommunicate with the informatics system 110 in a uniform, asynchronousand consistent way.

Informatics system 110 may comprise a data store 130, where the datastore is configured to store graph representations of both ontologies132 and source data 150. As mentioned above, a graph may be a formalgraph which is a computer interpretable graph representation (an examplewhich can be the resource definition framework (RDF) from the SemanticWeb framework of technologies). Thus, such graphs may be stored in thedata store 130 according to almost any format desired, as long as thegraph can be derived. Data store 130 may therefore be, for example anative triple store or a non-native triple store that may be utilizedwith a converter between a relational database and a graphrepresentation such as an Oracle Database 10g. Data store may alsorepresent the graphs according to other knowledge representationschemes, including relational databases, XML objects, serializableobjects, flat files, etc.

Ontologies 132 include at least one survey ontology 134, syntax ontology135, semantic ontology 136, domain ontology 138 and source ontology 140,while source data 150 may comprise data generated by users directlythrough the informatics system 110 or users at data sources 100, datainput to the informatics system 110 by a user directly or indirectly, ordata otherwise obtained from one or more of data sources 100. Thus,source data 150 may include graph representations of: surveys 152,source data 154, survey responses 156 and text 158.

Survey ontology 134 may be an ontology configured for the ad-hoccollection and mapping of data in a distributed and collaborativeenvironment. Survey ontology 132 may enable clinical researchers,practitioners, epidemiologists, public health researchers, respondersetc. to interactively design and deploy dynamic data collectioninstruments (such as clinical research forms, surveys, questionnaires,data abstraction forms) on an array of hardware, software, and networkplatforms (web, PDA, tablet PC based) that can seamlessly operate in acollaborative, multi-organizational environment regardless of thecontinuous availability of a reliable communication network.

Survey ontology 134 may be a unified graph comprised of multiplesub-graphs, where each sub-graph is configured to enable a competency byrepresenting the concepts and relationships associated with acompetency. Examples of such competencies are Project Management(comprising, for example, concepts such as users, groups of users,sites, authentication rights and roles), Vocabulary Services(comprising, for example, concepts for managing local vocabularies,mapping to Standard Vocabularies or other Meta-Thesauri), SurveyManagement (comprising, for example, concepts for managing datacollection instruments such as forms and questions, question options,question context, and their relationships with sites, groups andprojects), Human-Computer Interface (comprising, for example, conceptsfor managing and describing the behavior of the UI objects to interactbetween instrument components and human users in different hardware andsoftware platforms), Survey Templates (comprising, for example, conceptssuch as questions, form templates and Containers to manage an individualor a set of questions within their containers such that both questionsand form templates could be reused, reconfigured and combined toconstruct new data collection instruments), Validation and Qualitycontrol (comprising, for example, concepts for single value validation,multi-value associative validation, multi-form associative validation,multi-project associative validation, etc.). It will be noted thesecompetencies are examples only and that more or fewer competencies maybe implemented.

In one embodiment, the survey ontology 134 may be represented usingRDF/OWL. That is, the survey ontology 134 may be maintained as an OWLontology. The graph representation of all models and meta-data alongwith modular design and separation of the objects through assignment ofan independent and globally unique, unique resource identifier (URI) toall concepts may enable a complete view of all data and meta-data at anygiven time in a way that they can sustain functionalities in theinformatics system. All objects and concepts within survey ontology 134(for example, users, groups, sites, clients, vocabulary sets, questions,answers, options, GUI elements and styles, etc.) may be given, andidentified by, a single globally unique URI that can be used to furthercharacterize, classify, identify, retrieve or communicate the objectwith any and all systems and services.

Syntax ontology 135 is a graph representation of the potential contentof string based data received by the informatics system 110, includingfor example a token dictionary, terminological knowledge or a lexicon.Such an ontology may represent the basic syntactic constructs that maybe used by a parser to identify a sentence, and its pieces in order toparse it to a minimum number of legitimate tokens. As a parser may belanguage independent and have no grammatical commitment to a certainlanguage, this syntax ontology 135 may establish a basis for identifyingcertain linguistic expressions that can be used by the parser toidentify differences in data types (for example, Date, Time, Number,negation, etc.), and some syntactic cues that may be reliably used forsegmentation of a sentence (for example, delimiters such as “,” or “.”).

Specifically, in the setting of processing clinical text embodiments ofthe syntax ontology comprises minimal knowledge of English language interms of its basic syntactic elements (for example, Negation marks,delimiters (for example, space, −, /), punctuations (for example, “.”,”,“;”), Acronyms (for example, MI=Myocardial Infarction), Numbers (forexample, xsd:float, xsd:integer), Date (for example, xsd:DateTime) etc.)to define the existence of such concepts and their relationships inclinical text.

Syntax ontology 135 may also include a lexicon that allows a parser toidentify surface expressions from clinical text that have non-biomedicalsemantics. For example, all categories of negation expression,uncertainty, names (of known real world objects, individuals,organizations, places), units of measurement, chemical elements andparticles, etc. The syntactic ontology 135 may also include a lexiconfor the generic and mainly non-clinical aspects of clinical content.Here, each lexeme may be represented in terms of a unique resourceidentifier (URI) that can be referred to by many morphologicallydifferent symbols. Each lexeme is modeled as an instance of at least onesemantic class in the Lexicon (for example, “ctm:Reject models [reject,rejecting, rejected, rejects, . . . ]). Each class may have furthersemantics as inferred by its definition within the syntactic ontology135 or mapping to any other set of ontologies.

Semantic ontology 136 may provide a generic and extensible ontology forprototypical clinical content. This ontology is conceptualized to serveas a high level schemata (a clinical upper level ontology) with asubstantially minimal set of semantic constrains that sufficientlyrepresent major patterns identifiable in typical clinical text, andenables extensions and mappings to more specialized ontologies to meetparticular requirements of a new use case or domain. The semanticontology 136 may also provide mapping points for importing new semanticor syntactic concepts, or dynamic extension to meet requirements of anew type of document or domain (for example to add concepts pertainingto medications and prescriptions, in a model originally intended tocapture vital signs and physical exam data).

A semantic ontology 136 may include concepts such as clinical text andits different types such as chief complaint, relationships withpresenter (for example, patient, nurse, EMS personnel, etc.), clinicalobservation (for example, sign, syndrome, disease, procedure, etc.), andtheir locus (for example, body site or region, body part, etc.),modifiers (for example, QualitativeModifier and QuantitativeModifer),clinical contexts (for example, Temporal_Context, Allergy,Causation_Context, Process_Context, Allergy_Context, History_Context,etc.), or a wide variety of other concepts.

Domain ontology 138 may be an ontology that represents domain or taskspecific knowledge about a particular domain that may have a variety ofconcepts, where the concepts may be referred to by a number of differentlabels. In one embodiment, domain ontology 138 may be an ontologyrepresenting the Unified Medical Language System (UMLS). UMLS is acompendium of many controlled vocabularies in the biomedical sciences.It provides a mapping structure among these vocabularies and thus allowsone to translate among the various terminology systems; it may also beviewed as a comprehensive thesaurus and ontology of biomedical concepts.It is intended to be used mainly by developers of systems in medicalinformatics. UMLS includes the following components: Metathesaurus(UMLS-MTH) (instances of types) the core database of the UMLS, acollection of concepts and terms abstracted from the various controlledvocabularies, and their relationships and Semantic Network (UMLS-SN)(concept/types—events, entities, etc.) and a set of concepts andrelationships that are being used to classify and relate the entries inthe Metathesaurus. In the current version of the UMLS Semantic Network(SN) there are 135 Semantic Types (nodes) that are networked through 54Semantic Relationships (links).

Domain ontology 138 may have been created based on a simple knowledgeorganization system (SKOS) model (UMLS-SKOS) developed to represent theUMLS-MTH schemata and the UMLS Semantic Network (UMLS-SN) and allrelationships extractable from the combination. The UMLS-SKOS may thusbe an OWL ontology that partially but consistently adopts the UMLS-SNfor Semantic Web applications. This ontology may thus enable theinformatics system 110 to classify, infer or retrieve concepts in thedomain ontology 138 based on UMLS-SN. The UMLS-SN may be extended insidethe UMLS-SKOS ontology with properties to assert correspondence ofconcepts from any ontology or SKOS concepts from other non UMLS sourcevocabularies with UMLS-SKOS.

The contribution of UMLS-SKOS ontology to the informatics system is toconvert UMLS knowledge sources into a formal graph representation thatcan be mapped easily and readily to any other formal graph forcontextualization and mining.

Specifically, in one embodiment, UMLS-MTH concepts are assigned at leastone Semantic Type with the most specific semantic in the UMLS-SNhierarchy. Semantic Types contextualize UMLS-MTH concepts with textualannotations that define their types, and place them in an ‘is a’hierarchy. The ontology maps each Semantic Type into a correspondingowl:Class and each UMLS-Semantic Relationship into anowl:ObjectProperty. Concepts and properties in this model haverdfs:subClassOf and rdfs:subPropertyOf relationships when there is an‘isa’ relationship in UMLS.

In the domain ontology 138 each UMLS-MTH concept represents a resourcewith a unique resource identifier (URI) constructed using aNameSpace:CUI schema, where NameSpace can represent any unique URL suchas ‘umls=http://nih.nlm.gov/umls/’. All UMLS-MTH concepts may beconceptualized to be instances of (rdf:type) the concept representingits associated Semantic Type. The semantics of each UMLS-SKOS resource(each UMLS-MTH concept) is defined by its source and through variety ofmeans: by a textual definition or annotation; by its Semantic Type andits place in the hierarchy; by source defined relationships betweenconcepts, by terminological relationships between terms (hyponymy,hypernymy, synonymy, etc.) defined by the UMLS-MTH. There are, forexample, major groupings of Semantic Types incorporated in the UMLS-SNand therefore in the domain ontology 138, for organisms, anatomicalstructures, biologic functions, chemicals, events, physical objects, andconcepts or ideas. The creation of UMLS-SKOS for use as a domainontology 138 will be discussed in more detail later herein.

Each UMLS-MTH concept is provided with a unique concept identifier (CUI)that is used as a mapping point between concepts from multiple sourcevocabularies. Any textual representation or ‘atomic term’ used by asource vocabulary to refer to a biomedical concept also has its ownunique identifier (AUI). A CUI may be linked to multiple AUIs from thesame or different source vocabularies (SABs). The UMLS-MTH also containsall relationships that a source vocabulary may have defined or describebetween concepts or between terms. This qualifies the UMLS-MTH as a richand expressive source of terminology for biomedical and clinicalconcepts. However the UMLS-KS as is cannot be readily used or queried bya semantic application, as the semantics of the relational schemata usedto construct the UMLS-KS are implicit and not available for mapping orreal time inferences for information retrieval and querying by semanticapplications.

In another, the informatics system may use GALEN ontology from openGALENproject as the domain ontology and formal clinical model or any otherdomain ontology that formally and properly defines clinical concepts andtheir labels and relationships with each other within that domain. Thedomain ontology once mapped to the semantic model is used by informaticsplatform to provide context for interpretation of obtained data andparse graphs that are mapped to the semantic and syntactic ontology.

A source ontology 140 may comprise a representation of the structure ofdata received from a data source or the like and the type of datacomprised by that data source. As will be discussed in more detaillater, this ontology may be created and updated automatically by theinformatics system based on received structured data using a core schemaontology (CXM) and a datatype ontology. In one embodiment, concepts in asource ontology 140 may be mapped to concepts in a domain ontology 138.

Surveys 152 may be graph representations of a data collection instrumentcreated by a user. Surveys 152 may serve to expand the survey ontology134 (for example, by forming a unified graph with the survey ontology)by representing specific instances of concepts defined in the surveyontology 134 or representing new concepts which it is desired to create.Thus, a survey may specify specific instances, or types, of conceptsdefined in the survey ontology 134. For example, survey ontology 134 maydefine a “Question” concept. A survey 152 will define an individualobject of type “Question” which asks “Has a Blood Transfusion beenperformed?”. It will then create if not already present and map thequestion object to the Concept of “Blood Transfusion” which will providemeaning to the individual object, and enable its mapping to otherconcepts. Hence the question “Has a Blood Transfusion been performed?””will be mapped to the concept of “Question” in the survey ontology 134that enable the system to serve it to client application.

A survey 152 may also represent new concepts, that were previously notdefined in an ontology, such as, for example if the concept of a “BloodTransfusion” or a value of an answer (for example “Yes” or “No,”). Suchconcepts may be mapped to one or more concepts in the domain ontology138. Specifically, in one embodiment, when a user defines a concept thedomain ontology 138 may be searched (for example, using the MetaMap orMetaMap Transfer (MMtx) algorithm) to determine if any concepts in thedomain ontology are associated (for example, over a certain score) withthis newly defined concept. If any such concepts are found in the domainontology the user may be given the option to map the newly definedconcept to one or more of the found concepts.

It will be apparent that a survey 152 is extensible. Also it will beapparent that concepts in the survey may be mapped to other concepts inother ontologies. For example the concept of the question “Has a BloodTransfusion been performed?” may be mapped to the concept of “Infusion”in the some other ontology. As can be seen then, a unified graph mayexist between for example, survey ontology 134, domain ontology 138,etc. Examples of such surveys and this type of mapping will be discussedin more detail later in this disclosure.

Source data 154 may comprise graph representations of data received asstructured data from a data source. This data may be instances of aconcept defined in the source ontology 140 corresponding to the datasource from which the structured data was received (and that may havebeen constructed automatically by the informatics system based on thesame structured data). Thus, a unified graph may exist between sourcedata 154 and the source ontology 140. Furthermore, if as discussedabove, the source ontology 140 is mapped to a domain ontology 138 aunified graph may exists between the source ontology 140, the sourcedata and the domain ontology 138. Examples of such source ontologies140, source data 154 and this mapping will be discussed in more detaillater in the disclosure.

Survey responses 156 are graph representations of the responses tosurveys 152 obtained from users at data sources 100. These responses maybe instances of a concept defined in the survey ontology (for example, aquestion response concept) and may be associated with the question towhich the response corresponds. For example, a “Yes” response to thequestion “Has a Blood Transfusion been performed?” may be represented asan object that is an instance of the question response concept mapped tothe concept representing the question “Has a Blood Transfusion beenperformed?” (“Blood Transfusion” in this case) and the objectrepresenting the value “Yes”. As can be seen then, a unified graph mayexist between survey responses 156, survey 152, survey otology 134,domain ontology 138, etc. Examples of such survey responses 156 and thismapping will be discussed in more detail later in the disclosure.

Text data 158 may comprise a graph representing text obtained by theinformatics system 110. A graph representing text data may be mapped todomain ontology 138 or semantic ontology such that a unified graphexists between these graphs. Such a graph representation may be producedas a result of the parsing of clinical text based on syntax ontology135.

Informatics system 110 may utilize ontologies 132 and source data 150 ina variety of functions. These functions may include the implementationof a survey on demand system (SODS) module 160, a clinical textunderstanding (CTU) module 180, a structured data to ontology module 140and a data mining module 190. SODS module 160 allows for data collectionfrom users at various client devices 100 executing a client application102.

SODS module 160 may include a survey design module 162, a surveydistribution module 164 and a survey response module 166. Survey designmodule 162 may allow a survey to be constructed based on one or moreontologies 132, including the creation of new concepts in conjunctionwith the creation of the survey and value sets representing the valuesof potential answers to questions. More specifically, the survey designmodule may utilize survey ontology 134 to allow a user to create asurvey based on one or more concepts in the survey ontology 134 (forexample, by creating specific instances of concepts in the surveyontology 134) or to add concepts in conjunction with the creation of thesurvey, including concepts pertaining to the question and conceptspertaining to a value set comprising the values of potential answers toa question. The survey design module 134 may also allow conceptsassociated with the survey, such as values of a value set to be mappedto concepts in another ontology, for example domain ontology 138. Thus,the survey created by the user (including any new concepts defined bythe user) is a graph which represents the survey and concepts created bythe user. The survey is mapped to the survey ontology 134 and thus aunified is graph is formed between any survey 152 created by the user,the survey ontology 134 and the domain ontology 138. In this way, notonly can surveys be created by the user, but the concepts defined by theuser may be used to extend the survey ontology 134 (through the mappingbetween the graph representing the survey created by the user and thesurvey ontology 134).

A survey 152 can then be distributed to users on client devices 100which are executing a client application 102 associated with SODS module160 using survey distribution module 164, which may employ a networkservice such a web service or the like to distribute the survey to aclient application 102. Client application 102 may be web based (forexample, executed on a browser at the client and downloaded via arequest to informatics system 110), a resident application, etc., thatcommunicates through an architecture provided by the informatics system110 (for example, a services architecture or the like). Clientapplication 102 may access survey distribution module 164 and providesome form of user credentials. These credentials may serve to identifythe user of the device 100 utilizing the client application 102. Theclient application 102 may also identify any surveys which have beenpreviously received and stored on the device 100.

In response, the survey distribution module 164 may identify any surveys152 to be delivered to the client application 102. These surveys 152 maybe surveys 152 identified based on the user credentials, demographicdata, or other types of data associated with a user that may bedetermined based on the user credentials received or otherwisedetermined by the ontology. The surveys identified may be new surveys(not previously provided to the client application 102) or may beupdated versions of surveys previously provided to the clientapplication 102. The survey distribution module 164 may then deliver oneor more of these surveys to the client application 102. The clientapplication 102 may also cache interactions internally and securely whenan online service from informatics system 110 is not available, and whenconnectivity is established again, resume communication.

The client application 102 can render an interface at the client device100 to present the questions of the survey to the user based on thesurvey and send the user's responses to these questions to surveyresponse module 166. Survey response module 166 may be configured tovalidate and store responses received from client application 102 as asurvey response graph 156. More specifically, the response module 166may receive the responses from the client application 102, createinstances of a concept for a question response for each response and mapthe question response to a value of the value set associated with thequestion. The question response may also be mapped to a variety of otherconcepts, such as for example, a concept representing the change historyof the value, time a value has changed, etc. By mapping the questionresponses to the questions themselves, or other concepts, a unifiedgraph is created between the survey 152 itself, the survey responses156, the survey ontology 134 and the domain ontology 138. Such a unifiedgraph enables the response data to be retrieved based on the surveydesign (questions and their answers) or based on the concepts and theirrelationships from the ontology(s) (for example, people and theirdiseases).

Moving now to the clinical text and understanding (CTU) module 180, CTUmodule may comprise an interface module 181, a parser 182, a syntacticmapper 184, a semantic mapper module 186 and a domain mapper module 188.The CTU module 180 may receive clinical text though the interface module181. This clinical text may take a variety of forms, including texttranscribed from a doctor's or nurse's notes or charts, text from an EMRor other type of medical record, notes from a clinical trial, or textfrom almost any other source desired.

Parser module 182 is configured to utilize syntax ontology 135 to parsethe received text and may be configured to accomplish such parsingregardless of whether such clinical text has a well formed syntax orgrammatical representation. Such a parser may not be dependent on thesyntax of language, as the use of chunks (tokens) and a moving windowmay account for cognitive aspect of human reading text as will bediscussed in more detail later. Accordingly, such a parser may beutilized effectively, even with grammatically incorrect or structurallyaberrant text (often produced by doctors).

Parser module 182 may create text data 158 that may include a parsegraph for the received text. A parse graph is a graph representing thereceived clinical text that comprises concepts representing the tokensin the clinical text and their relationships to one another, includingthe order of the tokens their string representation. In other words, aninstance of a concept in the syntax domain 135 may be created andassociated with the value for a token. Thus, the concepts representingthe tokens of the clinical text may be associated with correspondingconcepts of the syntax ontology 135 as the parse graph generated by theparser module 181 may be mapped to the syntax ontology 135. By mappingthe parse graph to the syntax ontology a unified graph is createdbetween the parse graph and the syntax ontology 135.

Domain knowledge mapper module 188 may determine a corresponding conceptin the domain ontology 138 for each token in the parse graph. This canbe done using any search algorithm such as but not limited to MetaMapmapping algorithm to locate a concept in the domain ontology 138 (forexample, URI then type of that URI) associated with each token of theparse graph. The concept in the parse graph representing that token canthen be mapped to the associated concept located in the domain ontology138. By mapping the concepts of the parse graph to an associated conceptlocated in the domain ontology a unified graph is created between theparse graph for the clinical text and the domain ontology 138.

Semantic mapper module 186 may then use the unified graph of the parsegraph and the domain ontology 138 to map concepts in the parse graph toconcepts in the semantic ontology 136. More specifically, for each ofthe tokens in the parse graph the semantic mapper module 186 maydetermine an associated concept in the domain knowledge base. Thesemantic mapper module 186 can then determine if a mapping existsbetween the concept in the domain ontology 138 and the semantic ontology136. If such a mapping exists the semantic mapper module 186 may map theconcept in the parse graph to the concept in the semantic ontology. Inthis manner, a unified graph is created between the parse graph for theclinical text, the domain ontology 138 and the semantic ontology 136.

Referring now to structured data to ontology module 120, this module maycomprise an Schema parser module 122, a structured data to RDF mappingmodule 124, an ontology modeler module 126, an ontology populator module128 and an interface module 121. The structured data to ontology module120 may receive structured data (for example, data in an XML document ordata formed according to a database schema of a data source) through theinterface module 121. The structured ontology module 120 may processthis structured data to create a source ontology 140 to represent thestructure and type of the data received. Using this source ontology 140a graph representing the actual data received may be constructed (forexample, a source data 154 graph). Thus, a unified graph between thesource ontology 140 and the graph representing the received data isformed. In some embodiments, the concepts of the constructed sourceontology 140 may be mapped to concepts in domain ontology 138 usingautomated algorithms like the MMtx algorithm or manually. Thus, theunified graph formed may comprise not only the source ontology 140 andthe source data graph 154 constructed based on the received data but thedomain ontology 138 as well. In this manner, the received data may bemined by querying the unified graph according to the concepts andrelationships of the domain ontology 138.

In one embodiment of the system, once the mapping between sourceontology and domain ontology concepts established (automatically ormanually) the system would replace the source ontology concepts with thedomain ontology and populate the domain ontology using data fromstructured data instead of populating the source ontologies. This mayimprove the mapping and facilitate the mining of the resulting unifiedgraph according to an existing domain ontology.

More particularly, once structured data is received at the interface121, the Schema parser module 122 may use a core schema ontology toparse received structured data from a data source to create a sourcespecific schema model (XMODEL) corresponding to the data source fromwhich the structured data was received. In one embodiment, XMODELbasically translates the schema of the structured data into a formal andexplicit graph that a computer system can query, and interpret. It doesnot contain the actual data contained by the structured data (only aformal representation of the data model that can be extracted from thestructured data). In some embodiments of the system it may be updated byhuman experts to make configurations and add mapping information for useby future processes. Structured data to RDF mapping module 124 mayutilize the XMODEL to automatically create a graph representation of thereceived structured data. This graph representation may be an RDFrepresentation of the structured data based on the descriptions in theXMODEL. Ontology modeler module 126 may use this graph representation tocreate a source ontology 140 corresponding to the data source from whichthe structured data was received. Ontology populator 128 may utilize thesource ontology and the graph representation of the structured datareceived from the data source to construct a graph representation of theactual data received from the data source, where the graphrepresentation of the actual data received from the data source ismapped to the created source ontology 140.

It may be useful here to go into more detail with respect to the variousmethods implemented by the modules of the informatics system. Addressingfirst the SODS module 160, the functionality of such a SODS module maybe better explained first with reference to the ontologies which it mayutilize. Survey ontology 134 may be an ontology configured for thead-hoc collection and mapping of data in distributed and collaborative(teamwork) environment. Survey ontology 132 may enable clinicalresearchers, practitioners, epidemiologists, public health researchers,responders etc. to interactively design and deploy dynamic datacollection instruments (such as clinical research forms, surveys,questionnaires, data abstraction forms) on an array of hardware,software, and network platforms (web, PDA, tablet PC based) that canseamlessly operate in a collaborative, multi-organizational environmentregardless of the continuous availability of a reliable communicationnetwork.

Survey ontology 134 may be a unified graph comprised of multiplesub-graphs, where each sub-graph is configured to enable a competency byrepresenting the concepts and relationships associated with acompetency. A graphical depiction of a portion of such a survey ontologyis depicted in FIG. 3. It should be noted here that the sub-graphs,competencies, concepts, relationships, ontologies, etc. depicted hereinare to serve as examples only and that other ontologies, sub-graphs,competencies, concepts, relationships, etc. may be imagined andimplemented based upon the context in which embodiments of theinformatics system 110 is implemented and the desired functionality ofthe informatics system in these embodiments.

Here, survey ontology may comprise a sub-graph 310 for the projectmanagement competency (for example, comprising concepts such as users,groups of users, sites, surveys, etc.). Here, for example, the conceptsof users, groups, projects, sites, devices, operating systems aredepicted along with the relationships between these various concepts.Sub-graph 320 represents a form template, and comprises concepts such asa form, a question, a value set for an answer, etc. Notice that the formtemplate concept is related to the survey concept of the projectmanagement sub-graph 310. Sub-graph 330 comprises the concepts for thegraphical rendering of the concepts in the form template, including forexample, concepts related to the appearance of a question in a survey(for example, radio, checklist, checkbox, combo, etc.) and the conceptsof the type of input values that the interface will present (forexample, an enumerated value, a string, a numeric value, etc.), theconcept of the style that the question is to be presented in (includingfor example, the concepts of color and font). Notice that the questionconcepts in the form template sub-graph 320 are related to concepts inthe sub-graph 330. Thus a question may be related to the concepts thatdescribe how to render that question for presentation.

The survey ontology 134 may also be expanded by a user of theinformatics system 110, for example during the creation of a survey.When defining a question for a survey the user may define a conceptassociated with the question if the concept does not already exist inthe survey ontology 134. The concept defines the value set of answers tothe question based on the newly defined concepts. In the exampledepicted, the question in the sub-graph 320 is related to concept of“Blood Transfusion” (for example, a context) in the sub-graph 340 whichis related to the concept of a Boolean value set and the concepts of thevalues “Yes” and “No.”. In this manner, a user may create new-sub-graphsof concepts, value set and values and these sub-graphs may be unifiedwith the survey ontology 134 to extend the survey ontology 134.

The concepts representing related to questions and the conceptsrepresenting the potential answers may be linked to one or more conceptsin a domain (or other) ontology, to unify the survey ontology 134 with adomain ontology 138. As depicted in FIG. 3, the concept of “Yes” for theconcept “Blood Transfusion” is mapped to a concept unique identifier(CUI) or URI in the domain ontology 138 (in this example, UMLS-SKOS)associated with the label “Therapeutic or Preventative Procedure” andthe associated concepts in each of the various sources (for exampleSNOMED, LNC, etc.). Specifically, in one embodiment, when a user definesa concept the domain ontology 138 may be searched (for example, usingthe MetaMap algorithm) to determine if any concepts in the domainontology are associated (for example, over a certain score) with thisnewly defined concept. If any such concepts are found in the domainontology 138 the user may be given the option to map the newly definedconcept to one or more of the found concepts.

FIG. 4 depicts one embodiment of a method employed by SODS module togather and mine data based on such a survey ontology. At step 410 a usermay create a survey based on a survey ontology. More specifically, aninterface may be presented to a user to allow a user to create a survey.A survey may be a data collection form based on the concept of a formtemplate, each form template is in turn a reusable collection ofquestions (mapped to a question concepts) that can be shared or used byseveral surveys, each question may be mapped to a context concept andconcepts related to a set of values that define answers for thatquestion. Questions may also be mapped to other questions such that if aparticular value for the set of values that define answers for thatquestion is provided by a user a set of associated questions may bepresented to the user. The set of new questions related to each valuemay be predetermined and mapped at the design time or inferred and atthe run time based on the constrains entered in the survey ontology.Furthermore, the user may be given the opportunity to define newconcepts to expand the survey ontology and to map these newly definedconcepts to concepts of the domain ontology.

It may be helpful here to discuss the creation of such surveys and theontologies involved in the creation of such surveys. As mentioned asurvey may comprise a form for the collection of data. A survey may be aform based on a “form template” concept, where each form template maycomprise a collection of questions. FIG. 5 depicts an embodiment of aninterface that shows the composition of a form. Notice that the formdepicted in FIG. 5, is based on the concept “sods:FormTemplate,” and iscomprised of a number of questions including an instance of“sods:DateTimeQuestion”.

FIG. 6 depicts one embodiment of an interface which shows the definitionof a single enumerated question in conjunction with a survey. Anenumerated question may be an instance of the question concept. Anenumerated question can be mapped to concepts that define the set ofvalues that can be provided as answer, concepts that define itssemantics (context), concepts that define how the question is to bepresented in a user interface layout, etc. Question may also be mappedto the form templates to which it belongs or for templates where thequestion was copied from.

FIG. 7 depicts one embodiment of an interface which shows the linking ofan enumerated question to concepts that define valid value sets for thequestion. Enumerated Questions are linked to a concept in the surveyontology that define their valid value sets. That is, the responseranges that are valid for that question. In this example the concept ofantibiotics incorporates 38 different valid responses for any questionthat asks about Antibiotics. Each value in the value set (each optionfor an answer to the question) may be further defined and mapped by anindividual URI in the survey ontology such that a mapping (for example,using the concept sods:links) can be established with another ontology(for example, a domain ontology) to further specify its semantics. Forexample in this case, the option Metronidazole is mapped to a URI thatmaps it to a UMLS-CUI (for example, a CUI in the UMLS-SKOS ontology)that is associated with the National Institute of Health (NIH)definition of Metronidazole.

A depiction of one embodiment of the mapping between value sets and anontology is depicted in FIG. 8. In this example, the URI of an answer inthe survey ontology is mapped to a CUI of the UMLS-SKOS domain ontologywhich is, in turn, mapped to definitions in a set of sourcevocabularies.

FIG. 9 depicts one embodiment of an interface which shows the conceptassigned to an enumerated question, where the concept defines thecontext of the answers. Once an answer is provided for a question, itmay become an instance of this context concept. In this manner if thecontext concept is mapped to another ontology or defined formally, allresponses to that question will inherit that mapping. Furthermore,several different questions that are mapped to the same context, may betreated as the same question, even if they have different titles or aremapped to different interface concepts. Responses to several questionsacross different projects and different forms can thus be integratedwith each other by mapping them to the same context.

Turning now to FIG. 10, one embodiment of an interface which shows thedefinition of the “sods:DateTimeQuestion” as illustrated with respect toFIG. 5 is depicted. The Datetime question may be defined using a userinterface that allows a user to expand the survey ontology. Here theDatetime question is logically defined as a generic surveyQuestion (forexample, the concept of the Datetime question will be mapped to theconcept of surveyQuestion) where its control templates (for example,validation and user interface characteristics) are defined by theconcept of TemporalControls in the survey ontology (for example, theconcept of the Datetime question will be mapped to the concept ofTemporalControls) such that when the user accesses a survey thatincludes the Datetime question it will be presented according to theconcept TemporalControls and any answer the user provides to theTemporalControls may be validated according to the conceptTemporalControls.

FIG. 11 depicts one embodiment of an interface displaying aconfiguration of a TemporalControls concept (sods:DateTimeControl)mapped to the Datetime question concept. This TemporalControls conceptprovides a data type validation scheme and user interface object tocapture the data associated with the Datetime question. In this example,the sods:DateTimeControl concept is also linked to an specific styleconcept in the survey ontology that controls its layout on a GUI (forexample, sodsQuestionOptionStyle).

FIG. 12 depicts one embodiment of an interface displaying aconfiguration of the style concept sodsQuestionOptionStyle. This styleconcept may serve to define an interface style to a user interfaceobject such that any concepts mapped to the style concept may bedisplayed according to that style (for example, Red, 10 point, Tahomafont).

FIG. 13 is a representation of a portion of the survey ontology thatincludes the “sods:DateTimeQuestion” concept. More specifically, thegraph in FIG. 13 represents the DateTimeQuestion as logical definitionof a generic surveyQuestion where its ControlTemplates (validation anduser interface characteristics) are defined by the concept ofTemporalControls as discussed above.

Any question can be linked to a frame concept (referred to asFrameConcepts) to invoke a new set of questions based on the responseprovided to the question, such that when a user provided a particularresponse to a survey question the set of questions associated with theframe concept will be presented to the user in the survey. FrameConceptsare collections of one or more other questions. For example one can saythat on Option Yes for a pregnancy question, ask the following 3questions: Last monoposal date, number of previous pregnancies, and ifany risk factor exists. These frame concepts may be nested, such that aresponse to a question presented based on a frame concept may prompt aset of questions in a nested frame concept to be presented.

In one embodiment, questions may be the atomic units of data collection.Each question may be responsible for collecting a single, unambiguous,well-formed and valid value. A URI may be associated with, and thusutilized to identify, a particular question. The use of such a URI mayenable the identifying, reusing, moving, merging, cloning, copying,activating, versioning, tracing and logging and mapping of questions(and their responses) across surveys. It also enables the comparing andtyping of questions to each other to identify sameness or similaritiesof questions. Thus, this URI may be utilized to establish continuity ofthe data collection and establishing a basis for integration of similardata from past or future data collection or an import process.

Additionally, each question may be associated with a contextrepresenting at least one ‘context of use’. A context concept representsthe ‘meaning’ of a question. A context may be a concept created, orimported into SODS from existing ontologies. Binding a question to acontext concept and thus an unambiguous, formal and unique concept makesa question unambiguous, traceable and uniquely identifiable, although itmay be reused in many different ways, and presented in different ways ondifferent user interfaces. By associating a context with a question itis possible to unambiguously distinguish between questions (and theirassociated responses) at the time of querying or integration withexisting data. Forms (referred to also as form template) are containersthat organize a set of questions into a single unit for data collectioninteraction, with the result being a survey. In other word, a survey isbased on a form template concept.

In conjunction with the ability to define questions for a survey, thesurvey ontology may also provide a question response concept, where thequestion response concept may be mapped to the concept of the questionand the context of the questions. FIG. 14 depicts one embodiment of aportion of a survey ontology that comprises a question response concept,which will be mapped to an answer when such an answer is provided inresponse to the question. FIG. 15 depicts one embodiment of a graph withresponse concepts for a survey with two questions, where none of thequestions has any invocations but both questions may have answers from arange of predetermined URI(s) associated with concepts in an existinggraph. FIG. 16 depicts one embodiment of a graph comprising responseconcepts for a question that has invoked two other questions.

As discussed above, an informatics system may be able to create a formatfor the storage of concepts and relationships created using the SODSmodule of an informatics system. FIG. 17 depicts a graph representingthe relationships between relational database objects and concepts thatmay be used by the SODS module. This graph may enable a program, script,etc. to construct a relational database schema to store data from thegraph representation used by informatics system including conceptsrepresenting the questionnaire structure, question responses, theirrelationships to each other etc. Such a program or script may identifychanges in an existing schema needed to persist all data pointscollected through an RDF graph.

One embodiments of a method for the construction and population of sucha relational database schema is depicted in FIG. 18. Here, the currentlyexisting survey ontology may be loaded as a graph (for example,represented in OWL). A difference determined between the currentlyexisting database and the newly updated ontology. The old schema may beretracted from the database and a new schema corresponding to the newlyupdate ontology may replace the old schema. FIG. 19 depicts a listing ofa relational database schema that may be constructed from a graph usedby a SODS module.

In one embodiment, the data store may be at least partially configuredas a relational database schema configured to embody data that isrepresented as a formal graph. Specifically, there may be a databaseconfiguration module (not shown) that can evaluate surveys to constructone or more database schema types to store the survey responses each foran specific use case and specialized purpose. For example, in oneembodiment the following schemas may be generically computed for everysurvey response:

a. RDF model: all survey responses may be well formed RDF documents whenthey are received (as discussed later) and can easily be added to datastore 130 which may be configured as a triple store. However, one ormore transformations may occur prior to storage of an RDF surveyresponse to the data store. One transformation may assign a URI to theresponse based on if the response is associated with a context that isan identifier as described above. The method will ensure valid objects(for example, responses, questions, etc.) are found and associated withthose URIs at the time of insertion into the data store such thatqueries to describe those URI can retrieve proper data substantiallyimmediately after insertion of the new responses.

Accordingly, an RDF view of the data may be a globally integrated andunified view of all surveys from all projects can be navigated or minedfrom multiple perspectives as the RDF transformation process plus theURI assignment mechanism result in a unified graph (within the RDFmodel), as long as the contexts associated with the survey ontology areused and mapped consistently and properly throughout the life of thesystem.

b. Standard Relational DB for Online Transactional Systems (OLTP): Inone embodiment a parser algorithm will inspect questions andrelationships between questions associated with each survey to constructa default relational schema for each survey. As a result, these databaseschemas may be relational schemas that are immediately useful for onlinetransactional processing (for example, OLTP).

c. Rectangularized DB (Spreadsheet): In one embodiment, all relationallinks associated with a survey in this view are collapsed into a singletable that turns all one too many relations into an iterative set ofcolumns in the same table. That is, the normalized structure of therelational schema constructed in the previous model, is denormalized into one big rectangularized schema that encompasses all relations andfields (columns) repeated as many times necessary in the same table.

d. Multidimensional Databases (CUBE representations): in one embodimentall concepts mapped to enumeration Questions are considered asdimensions of a multidimensional database, all numerical question typesare considered measures in a multidimensional database and allIdentifier Questions are considered reportable (countable) entities of amultidimensional database, where a computer algorithm then can parsethrough the RDF graph and construct either a star schema relationaldatabase readily available for CUBE processors or directly implement aCUBE inside analytic engines such as Microsoft Analytic Server.

Returning now to FIG. 4, as discussed above, when a survey has beendefined using a survey ontology a unified graph may be created betweenthe survey ontology, the domain ontology and the created survey. When auser at a client device may access the informatics system theinformatics system may select a survey to deliver to the user at step420. More specifically, based on some criteria associated with the useror the client device accessing the informatics system (for example, useridentifier, client device identifier, data associated with the user sucha clinical trial identifier, sex, location, medical data or almost anyother data desired that may be provided or obtained about the user orthe client device) the SODS module may select a survey to present to theuser.

In one embodiment of the system all these criteria can be incorporatedin the survey ontology to customize access to the system resource basedon all information available to the system up to that moment bysearching the unified graph as a whole (survey ontology, surveyresponses, question response, user profiles, domain knowledge, etc.).

The survey may be selected by, for example, identifying a concept in theunified graph representing the user. The unified graph comprising thesurvey ontology, the survey and the domain ontology may be navigatedstarting at the concept in the graph associated with the user todetermine a survey associated with the user to provide to the user. Thesurvey provided to the user may comprise a RDF description of theportion of the unified graph comprising the concepts and relationshipsmapped to the selected form template or may comprise an identifier forthe form template such that an application at the client device mayprovide this identifier to the informatics system to obtain data (forexample, concepts or relationships) corresponding to the form templateas they are needed).

Once the survey is received at the client device, an interface may berendered based on the survey by the client application executing on theclient device. More specifically, the interface may present thequestions of the survey according to the concepts of the survey (forexample the concepts representing the questions, the concepts for thegraphical rendering and style of the question concepts in the formtemplate, the concepts of the types of value that the questions willaccepts, etc.

FIG. 20 depicts one embodiment of such an interface that may bepresented to a user at a client device, where the interface has beenrendered based on the survey provided by the informatics system to theclient device. The user may interact with the rendered interface toprovide responses to the questions presented through the interface.These responses may be captured by the client application on the clientdevice. In one particular embodiment, the data entered by the user withrespect to the rendered interface of the survey is captured as RDF andassociated with one or more questions of the survey.

It should be noted here that because of the architecture of theinformatics system, once a survey is obtained from the informaticssystem the survey may be “taken” (for example an interface associatedwith the survey rendered and answers obtained and stored on the clientdevice) regardless of whether the client device is in communication withthe informatics system at the time the survey is taken by the user. Thiscapability exists because in some embodiments, all the informationneeded by the client device to render the interface of the survey andcapture the response to the questions of the survey was delivered by theinformatics system in form of a self-descriptive survey graph. In otherwords, in one embodiment, the provided survey may comprise allinformation needed by a client device to present the interface for thesurvey and capture the response. It will be apparent, however, thatother architectures are also possible. For example, the clientapplication may obtain each question of a survey from the informaticssystem as it is needed to render the interface and provide answers tothe question to the informatics system as they are provided by the userwith respect to the interface. Other arrangements will also be possible.

In any event, once responses to the questions of the survey are capturedby the client application at the client device at step 430, they may beprovided to the informatics system whenever the client device is incommunication with the informatics system at step 440. These responsemay be provided in a response graph, that may be an RDF graph thatrepresents the user and client device from which the responses are beingprovided, the date the survey was taken, the survey to which theresponse were provided (for example a survey identifier or versionidentifier), the answer associated with the question, user submittingthe survey, etc.

When the response graph is received, each of the response to thequestion may be validated against an expected type of response andrepresented in a question response concept that is associated with thequestion of the survey to which it is a response, as depicted in FIGS.15-17. In this manner each of the response to the questions of thesurvey is represented in a question response concept that is associatedwith the concept representing the question to which it is a response. Inone embodiment, all response to a question are mapped to a sub-graphthat that keeps track of the versioning and update history of theanswer.

FIG. 21 depicts one embodiment of a question response mapped to such asub-graph. A note concept may be associated with every new update, suchthat a series of time stamped notes can be attached to every update toevery response to every question in every survey. Using these noteconcepts then, any change in an answer can be traced, logged andaudited.

Responses (question responses) may be associated with values recorded bya particular user as an answer to a single question presented in aninterface associated with a survey. In one embodiment, a SODS moduleprovides a globally unique way to identify responses to question usingthe same URI mechanism used to unambiguously identify and interact withquestions. In one embodiment, a received response is represented by aURI that is globally unique to that instance of question response,excepting in the case where the response is an answer to a questionwhose context is itself a unique identifier. That is, if two responsesto two questions are recorded at different times, the two answers willreceive the same URI only if they point to the same question context andthat question context is an Identifier concept of the question itself isan Identifier Question according to the Survey Ontology. Otherwise, eachanswer will receive a unique URI of its own. In other words, questionresponse URIs are reused and recreated for those questions whose contextmay be used as an identifier (for example, for Social Security) or ifthe Question type is set to the Identifier Question.

For example a question “Please enter your SSN:” and “Social SecurityNumber” may be asked in two different forms A and B, in two differentand independent projects, in two different times apart from each other.However, if both questions are contextualized (associated with) the sameSODS context of “Social Security Number”, and if the “Social SecurityNumber” is marked as a unique identifier of a person, the same globalidentifier may be assigned to a response recorded by the two distinctforms at different times. As a result, it can be identified that theseforms are both about the same ‘person’, and data mining augment,compare, integrate, etc. data about that person determined fromresponses to form A with data from response to form B, although theywere designed in different time, for different purposes.

Question response concepts may be, in turn, collected into a conceptcalled a survey response. FIG. 22 depicts one embodiment of a surveyresponse concept sub-graph. Survey responses are linked to a formtemplate concept that is in turn contextualized by a context (forexample, from an ontology) concept. The link between the form templateand the ontology concept can be interpreted as being an instance of thatconcept. Similarly, a question response may be interpreted as aninstance of the context concept mapped to the question to which it is aresponse, a survey response may become an instance of the contextconcept for a the template concept associated with a survey. Thisenables the identification of forms, surveys and response that areconceptually or semantically about the same real world objects orconceptual entities. For example two different forms for collecting data‘about Influenza’ can get linked to each other and treated byapplication similarly, when they both use the same context concept fortheir templates.

FIG. 23 is a representation of an example survey response with fourquestions answered. One of the questions is expanded to demonstrate theresponse (yes option) and the fact that it invoked a frame concept whenanswered with the “Yes” option. As can be seen, the depicted surveyresponse is also an instance of a concept that represents its context(rdf:type Daily_ICU_Form_(—)1).

Accordingly, when responses are received from a client device, theseresponses may be represented as questions response in a unified graphwhere all of the question responses are mapped to the question of thesurvey to which they are responses and to a survey response conceptrepresenting a response to that survey. AS the survey is mapped to thesurvey ontology and the domain ontology, a unified graph is thus formedfrom the survey response, the survey, the survey ontology and the domainontology.

The resulting unified graph may be searched at step 450 to obtain dataabout the response to the surveys received from the users at the clientdevice. In one embodiment, the interface presented to the user mayprovide an open framework for the user to construct queries according tothe context of the domain ontology. Specifically, the interface maypresent the users with the set of concepts or relationships utilized inthe domain ontology to allow the user to formulate queries based onthese concepts and relationships. Searches can then be formed andconducted based on the domain ontology. In this manner users areprovided with a highly effective and contextual method for extractingmeaning from obtained data. In particular, the concepts in the domainontology specified by the user using the interface may be used asstarting points in the unified graph and the graph navigated from thesestaring points to determine survey data responsive to the user's query.In one embodiment, these queries formed by the user can be translatedinto a SPARQL query that is run against the unified graph comprising thedomain ontology, survey and survey responses obtained from users toprovide the user who initiated the query with data obtained from usersthat is relevant to the query.

Other methods of gathering and mining data may also be utilized by aninformatics system. As discussed above, data may also be obtained fromtext based sources. FIG. 24 depicts one embodiment of a method that maybe employed in conjunction with a CTU module of an informatics system toprocess such text. Text, such as clinical text may be received from adata source at step 2410, parsed according to a syntax ontology togenerate a parse graph at step 2420 and the concepts of the parse graphmapped to a domain ontology and semantic ontology in step 2430 to createa unified graph between the graph representing the clinical text, thedomain ontology and the semantic ontology. The resulting unified graphmay be searched at step 2440 to obtain data about the clinical text.

In a medical environment these sources may comprise, for example, anelectronic medical records system (EMR), lab reports, medical charts,discharge diagnosis, chief complaint, nurse and practitioner notes,diagnostic reports and consultations, etc. This text may be inputmanually to the informatics system or received electronically. This textmay be processed to normalize the text or to extract certainnon-essential text before further processing is done.

The method may thus employ a syntax ontology, a semantic ontology and adomain ontology as discussed above. Before delving into the method inmore detail it may be helpful to elaborate on these types of ontologiesas they may be applied to the method of processing clinical text. Thesyntactic ontology utilized may be selected based upon the expectedlanguage, format, type of text, environment to which the text maypertain, etc. The syntactic ontology may be used to provide tokens,including a dictionary of valid terms in a domain (lexicon),morphological and syntactic rules of the underlying language (such asvalence and inflexions), and a grammar that sanctions or constrainsallowable combinations of terms in a domain. The lexicon may alsocontain relationships such as synonymy, hyponymy (i.e., narrower),hypernymy (i.e., broader), polysemy (i.e., related terms), and meronymy(i.e., part of term) between terms (terminological knowledge) to be usedfor disambiguation and reducing the variability (normalization) of theoutput. FIG. 25 depicts one embodiment of tokens representing quantitiesdefined in a syntax ontology.

The syntax ontology may be an OWL ontology that represents a lexicon forthe generic and mainly non-clinical aspects of the clinical content. Themodel represents each lexeme in terms of a unique resource identifier(URI) that can be referred to by many morphologically different symbols.Each lexeme is modeled as an instance of at least one semantic class orconcept in the Lexicon or Syntax ontology (for example, “ctm:Reject[reject, rejecting, rejected, rejects, . . . ]). Each class may havefurther semantics as inferred by its definition within the ontology. Forexample, as depicted in FIG. 26, ctm:Reject may be a subclass ofctm:Active_Negation, whereas the ctm:Unable is an instance of bothctm:Subjective_Negation and ctm:Passive_Negation.

A text-understanding application intended to operate in a biomedical andclinical environment may use a domain ontology that formally describesdomain concepts (for example, Diseases) and semantic relationshipsbetween them (for example, All Infectious Disease are Caused by someInfectious Agent). In one embodiment, the domain ontology may beUMLS-SKOS, an OWL ontology that partially but consistently adopts theUMLS-SN for the Semantic Web applications. FIG. 27 depicts a portion ofthe UMLS-SKOS domain ontology.

The UMLS-SKOS domain ontology maps each UMLS Semantic Type into acorresponding owl:Class and each UMLS Semantic Relationship into anowl:ObjectProperty. Concepts and Properties in this model haverdfs:subClassOf and rdfs:subPropertyOf relationships when there is an‘is a’ relationship in the UMLS-KS.

In the UMLS-SKOS domain ontology, each UMLS-MTH concept represents aresource with a unique resource identifier (URI) constructed using aNameSpace:CUI schema, where NameSpace can represent any unique URL suchas ‘umls=http://nih.nlm.gov/umls/’. All UMLS-MTH concepts areconceptualized to be instances of (rdf:type) the Concept representingits associated Semantic Type. For example, as depicted in FIG. 28, the“Plasminogen Inactivator” with the CUI=C0032145, is a resource uniquelyidentified by the uri=‘umls:C0032145’ in the UMLS-SKOS and has twosemantic types of “Amino Acid, Peptide, or Protein” and “BiologicallyActive Substance”.

The semantics of each UMLS-SKOS resource (each UMLS-MTH concept) isdefined by its source and through variety of means: by a textualdefinition or annotation; by its Semantic Type and its place in thehierarchy; by source defined relationships between concepts, or byterminological relationships between terms (hyponymy, hypernymy,synonymy, etc.) defined by the UMLS-MTH. There are major groupings ofSemantic Types incorporated in the UMLS-SN and therefore in theUMLS-SKOS for organisms, anatomical structures, biologic functions,chemicals, events, physical objects, and concepts or ideas.

The UMLS-SKOS domain ontology may allow for extensions that enableclassification and reasoning in a range of applications related to thebiomedical domains. For example, FIG. 29 depicts how two UMLS SemanticTypes (Phenomenon_or_Process and Chemical_Viewed_Functionally) have beenused to express logical constraints that define the new concept of‘SubstanceAdministration’ inside the ontology to represent a newclinically meaningful pattern (an Observation that involvesadministration of at least one chemical with a known function, alongwith some optional dose, frequency and route information). Rememberingfrom previous section, an observation in this model is a temporalentity, that is, a substance administration will be sanctioned to have arelationship with a temporal entity such as an absolute (for exampleDec. 1, 2010 12:32 pm) or a relative time (for example, 2 hours ago).

The semantic ontology may be a generic and extensible ontology thatrepresents the concepts that are likely to be found in text of the typebeing processed. A semantic ontology may serve as a high level schemata(information model) with minimal set of semantic constrains thatsufficiently represent major patterns identifiable in typical text ofthe type being processed that enables extensions and mappings to morespecialized ontologies to specialize it to meet particular requirementsof a new use case or domain. The semantic ontology may define meaning oflexical constituents of text and its syntactic components by mappingthem to unique concepts and sensible relationships between them. In mostsystems semantic knowledge includes a set of explicit schemata thatcaptures generalized semantically interpretable relationships betweenconcepts, and semantic interpretation of template linguistic patternsobservable or frequently used in the clinical content. That is, thesemantic knowledge enables the algorithm to determine the properrelations between terms within the text, and transforming (mapping) themto desirable output formats.

The semantic ontology may be OWL ontology has been constructed toprovide a generic and extensible information model for a prototypicalclinical content. The model is conceptualized to serve as a high levelschemata (information model) with minimal set of semantic constrainsthat sufficiently represent major patterns identifiable in a typicalclinical text, and in the meantime enable ad-hoc extensions and mappingsto more specialized (for example, task specific) ontologies by systemsthat intend to specialize it to meet particular requirements of a newuse case or domain.

The semantic ontology may also provide mapping points for importing newsemantic and syntactic ontologies, or extending it dynamically to meetrequirements of a new type of document or domain (for example to addconcepts pertaining to medications and prescriptions, in a modeloriginally intended to capture vital signs and physical exam data). Thesemantic ontology may include concepts such as clinical text and itsdifferent types such as chief complaint, relationships with presenter(for example, Patient, Nurse, EMS Personnel), Clinical Observation (forexample, Sign, Syndrome, Disease, Procedure), and their Locus (forexample, Body Site or Region, Body Part), Modifiers (for example,QualitativeModifier and QuantitativeModifer), Clinical Contexts (forexample, Temporal_Context, Causation_Context, Process_Context,Allergy_Context, History_Context) that can further explain implicationsof Clinical Observations are introduced in this model. FIG. 30graphically depicts a portion of one embodiment of a semantic ontology.

With these syntax, semantic and domain ontologies in mind, attention isdirected back to FIG. 24 and the method for representing andcontextualizing clinical text depicted therein. Text, such as clinicaltext may be received from a data source at step 2410. The received textmay be prepared or processed to put the text in a format for parsing. Atstep 2420 the text may be parsed according to a syntactic ontology. Thisparser may perform a text parsing and syntactic analysis. The results ofthe syntactic analysis forms a parse graph that is comprised of tokensof text mapped to concepts of the syntax ontology.

In one embodiment, parsing may occur by creating evidence spaces fromthe input text (for example, by segmenting the text (segments of textare referred to as evidence spaces) according to identifiers defined inthe syntax ontology. Chunks can then be created within each evidencespace by using an iterative algorithm which creates permutations of allpossible chunks of size 5 (plus or minus 2) within the evidence space.Within each of the evidence spaces, rules can be used to exclude zero ormore of the chunks. Such a parser may not be dependent on the syntax oflanguage as it uses chunks (tokens) and may utilize a moving window toaccount for cognitive aspect of human produced text. Accordingly, such aparser may be utilized effectively, even with grammatically incorrect orstructurally aberrant text (often produced by doctors).

More specifically, in one embodiment, the parser may compute an indexedarray of all permutations of tokens extractable from input text based onthe position of syntactic concepts (represented in the syntacticontology) in the input text. A token is any ordered combination of wordsextracted from text. Tokens may be defined by their positional index(their distance from the beginning of the text) and their length (numberof words they contain). Tokens can overlap, contain or trail each other.

The parser first scans through the text to create larger segments oftext based on syntactic concepts found in the syntactic ontology. Anevidence space may be a token closest to a sentence or a phrase. Asentence in the text may therefore comprise multiple or a singleevidence space. These evidence spaces are ordered, and are parsedindividually to create all permutations of legible tokens based on theabove heuristics as it maintains the order of the evidence spacesaccording to the text.

To reduce the size of combinatorial space, an algorithm based on theregular expressions uses the lexicon provided by the syntactic ontologyto identify and tag tokens with the least possibility of representing asingle unique concept (for example, tokens containing dates, time,numbers, separators, etc.), or those tokens whose type is alreadyidentifiable by mappings between the syntactic model and the semanticmodel (for example, named objects (People, Devices), units ofmeasurement, negation, etc.).

A parse graph can then be generated wherein the parse graph represents asequence of evidence spaces and within each evidence space chunks andtheir dependencies, for example, tokens extracted from the text andtheir positional relationships. This graph representation may representthe concepts and relationships of the text. In one embodiment, thegeneration of parse graph may include representing chunks as RDF,assigning URIs and representing relationships between the chunks. Theparse graph may be a directed graph with a non-hierarchical structure (anetwork) that maintains an index of all tokens and their positionalinformation from original text as well as their containment informationas a token may contain other tokens (example token related to “left arm”also contains tokens of “left” and “arm” which once linked form a smallsub-graph).

FIG. 31 depicts a representation of one embodiment of a parse graph. Aparse graph may represent a set of ordered evidence spaces (here, forexample, (evidence spaces 1, 2, 3, 4). In particular, here, the evidencespace 1 is represented. The evidence space 1 may represent the text“Large Blister on Toes and Abdomen.” Notice here that tokens of theevidence space may be ordered (for example, the token “large” is beforethe toke “blister” which is before the token “toes”, etc.). A largetoken may contain smaller tokens (for example, the token “large blisteron toes” contains the tokens “large blister” “on” and “toes”, etc.). Aparser can effectively query this parse graph to extract a parse treeconsistent with the phrase structure grammar, or a dependency diagramconsistent with a dependency grammar. FIG. 32 depicts the correspondingoutput of a syntactic parser using a typical context free grammar ordependency grammar.

At step 2430 the graph representation of the text (parse graph) may thenbe mapped to a domain ontology to form a unified graph comprising theparse graph representing the text, the syntax ontology and the domainontology. Using the mappings between the graph representing the text andthe domain ontology, and previously established mappings between thedomain ontology and the semantic ontology, the graph representing thetext may be mapped to a semantic ontology. In this manner, a unifiedgraph comprising the graph representing the text, the domain ontologyand the semantic ontology can be formed.

More specifically, in one embodiment, concepts of the parse graph may bemapped to concepts in the domain ontology using a matching algorithmsuch as the MMTx algorithm, as discussed above. In one particularembodiment, the MMTx linguistic analysis and concept mapping tool fromNLM may be used to map eligible tokens in the parse graph to theUMLS-MTH. While all eligible tokens may be processed by the MMTx, onlytokens with a MMTx mapping score of 1000 (a perfect match with at leastone UMLS-MTH concept) may be mapped. The CUI and Semantic Typesassociated with the token are returned as the results of the applicationof the MMTx algorithm. The MMTX algorithm may be utilized to add thelink between a given token and a corresponding CUI using the:correspondsToCUI property. This associates the token with the UMLS-SKOSresource defining the corresponding CUI and its Semantic Type(s). Assoon as a token is linked to a corresponding CUI, the class membershipof the token with a corresponding class in the Semantic ontology may beestablished.

In one embodiment of the system and using the mapping of the concepts inthe parse graph, (for example, the RDF graph generated by the syntacticanalysis of the parser) a mapping algorithm tries to connect each tokenof the parse graph with some concept (for example, owl:Class) from thesemantic ontology. That is, the parse graph is further extended byinformation regarding mapping of each token to a related concept fromthe syntax or semantic ontologies. Each token in the resulting RDF graphis represented as an instance (rdf:type) of at least one concept(owl:Class) from the semantic ontology. Extensions and modifications tothe ontology representing the semantic ontology may affect the classmembership and classification results. This can be used as a vehicle tocustomize and contextualize the behavior of the system for different usecases, without changing the algorithm.

FIG. 33 depicts one embodiment of a unified graph comprising the tokensof a parse graph, a semantic ontology (here InfM) and a domain ontology(here UMLS-SKOS). In one embodiment an example of an RDF outputassociated with such a unified graph related to the text “a 13 years oldteenager with nausea and vomiting after drinking bad milk. has takenReglan that made her drowsy and confused. no fever and headache. Feelstingling on finger tips and around his mouth. dry skin in observation”may look like the following:

In one embodiment, after the mapping described above is complete afilter function may discard from the parse graph all tokens that havefailed to map to at least one concept in the semantic model. At thisstage the process of extraction and encoding may be complete in that theinteraction of the tokenization, mapping and filtering functions haveextracted all meaningful concepts identifiable using the combination ofthe system lexicon, the terminological and domain knowledge (UMLS-SKOS)and the semantic ontology.

A semantic interpreter may add an index to all tokens based on theirsemantics extractable from the syntax and semantic ontology, and itslinkage to the domain ontology (for example, UMLS-SKOS). The indexeruses heuristics associated with the allowable distance for relatedconcepts (for example, five as discussed above), syntactic cues from thesyntax ontology (for example, the role of ‘and’, ‘or’, ‘in, on, into,upon, of’ etc.), and semantic relationships defined in the semantic anddomain ontologies to transform the parse graph into a conceptual graphin which tokens are related to each other based on a set of genericrelationships other than their position in the text. Relationshipsbetween tokens in the conceptual graph are similar in utility to theedges in a dependency diagram, in that, they indicate relationshipbetween tokens without making an assumption about its nature and aspecific meaning.

FIG. 34 depicts an example of a conceptual graph. Note that the tokensrelated to “Rash” and “Scar” both are related to the “Face” through a“precede” property but have no relationships with each other, and thatthe semantics of how this precedence should be interpreted, and what itmay mean in any context is not represented.

FIG. 35 depicts the formal RDF output corresponding to the conceptualgraph of FIG. 33. The conceptual graph may be an intermediate outputthat represents tokens of clinical text mapped to concepts fromontologies with formal semantics and encoded with at least one UMLS-MTHCUI when possible, linked to each other and to their meaning in theontologies available to the system. This enables any third party parser,classifier, or reasoner to be able to use the conceptual graph forfurther processing, querying and contextualization to construct outputsspecific to their local needs, without having to utilize the specificontologies used by the informatics system. This enables reuse andrepurposing of such a conceptual graph in other contexts.

In any event, the unified graph comprising the tokens of the parsegraph, the semantic ontology and the domain ontology may be searched atstep 2440 to obtain data about received clinical text. As discussedabove, the interface presented to the user may provide an open frameworkfor the user to construct queries according to the context of the domainontology. Specifically, the interface may present the users with the setof concepts or relationships utilized in the domain ontology to allowthe user to formulate queries based on these concepts and relationships.Searches can then be formed and conducted based on the domain ontology.In this manner users are provided with a highly effective and contextualmethod for extracting meaning from obtained data. In particular, theconcepts in the domain ontology specified by the user using theinterface may be used as starting points in the unified graph and thegraph navigated from these staring points to determine survey dataresponsive to the user's query.

In addition to processing clinical text, embodiments of an informaticssystem may utilize a substantially automated method of creating aunified graph based on a structured dataset (which may for example, bereceived from a data source), such as an XML document formed as an XMLmessage or the like, or a data formed according to a database schemaemployed by a data source. Specifically, in one embodiment, thestructured dataset may be received and a graph representation of anontology that describes the structure or types of data from the datasource may be constructed. A graph representing the actual data of thedata set may then be constructed based on the ontology describing thestructured data to create a unified graph comprising the ontology andthe graph representation of the data of the dataset. This unified graphmay then be used for a variety of purposes. For example, in oneembodiment, concepts in the ontology may be mapped to a domain ontologyor the like such that a unified graph can be created from the ontologyrepresenting the source, the graph representing the data of thestructured data and the domain ontology. Such a unified graph can thenbe searched according to the concepts and relationships of the domainontology.

FIG. 46 depicts one embodiment of creating a source ontology based onstructured data representing a particular data. In particular therelationship between the input (structured data such as an XML message),the outputs (source ontology or TBOX and ABOX (population of theontology with the data from structured data)), and the intermediaterepresentation (for example, an isomorphic RDF graph) is depicted. Itwill be noted that the isomorphic RDF graph may be disposed of after theABOX is populated.

Here, the data set comprising a structured representation of data fromthe data source may be translated to a graph representation comprising asource ontology (TBOX) and the formal representation of data describedby the ontology (ABOX). An ontology for the data source (which isreferred to as a source ontology or the TBOX for the data source) may becreated automatically based on the graph representation of the receivedstructured data. Once the source ontology is constructed, the data fromthe data source may be represented as a graph (referred to as the graphrepresentation of the data or the ABOX for the received data) bypopulating instances of the concepts in the ontology for the data source(the TBOX).

FIG. 47 depicts the one embodiment of a method of creating an ontologyfor a data source and representing data from a data source according tothe created ontology in more detail. The method depicted may utilize acore schema ontology, that may comprise knowledge on the construction ofstructured documents and which may form a unified graph with a datatypeontology which is a representation of types of data which may exist in adata source. Specifically, a datatype ontology introduces a simpleclassification of datatypes that are expected to be found in thestructured data. It starts with notion of basic datatypes such asnumbers, strings, datetime etc. Each datatype may get further extendedto include subtypes, for example such as integer or float, in the caseof numerical datatypes. FIG. 52 depicts one embodiment of a portion of adatatype model.

The core Schema model or CXM imports the datatype ontology and describesany given structured data set in terms of two aspects: 1) the hierarchy(for example, in an XML document it would be formal description of XMLElements and XML Attributes, and the child parent relations between themand 2) the Concept Expressions. Concept expressions describe each andevery data element (e.g., XML node, including both XML Elements and XMLAttributes) in terms of what kind of information it brings to bear. Inthis ontology, every data element may be categorized as the main conceptbeing described by other data elements (SchemaExpression) or it may becategorized as some metadata about a main concept (MetaDataExpression).For example, in the case of an XML document, it formally establishes thesimple assumption that there is only one concept (SchemaExpression) tobe described in each and every XML Element (and all other concepts inthe XML Element are basically some description (MetadataExpression) ofthat SchemaExpression.

In case of a relational database this can be described as following: allprimaryKey Identifier columns of a given table are represented as a nodecategorized as SchemaExpression (and there can only be only one of themper table row) and all other fields are child nodes of that node andcategorized as MetaDataExpression. In both examples the informaticssystem first establishes a hierarchy between nodes, and then maps themto some ConceptExpression. The Concept Expressions have their ownextensions. That is, both SchemaExpression and MetaDataExpression can befurther specialized and further described by more specific definitions.For example, in an XML element <Data Patient=“1023” Age=“55”Race=“White”/> The Patient Node is the SchemaExpression, and all othernodes are MetadataExpressions. This ontology enables binding any givennode to a meaning, use case, or combine it with other nodes to compose anew meaning based on data from different aspects of multiple nodes.

For example, a global patient identifier can be constructedautomatically for all patients by combining the data from a Patient IDnode with the data from Hospital ID node. This constructs a newidentifier concept for each patient that is unique in the context ofmultiple hospitals, therefore eliminating the possibility of twodifferent patients with similar IDs from two different hospitals beingmistaken with each other. FIG. 53 depicts one embodiment of a snapshotof core Schema ontology and its extensions that may be used toinstantiate XML nodes and Concept Expressions.

Moving to the actual algorithm, first at step 4710 an schema parseralgorithm may use the core Schema ontology (CXM) to parse receivedstructured data from a data source to create a source specific schemamodel (XMODEL) corresponding to the data source from which thestructured data was received. The CXM ontology may be used to parse anyincoming structured data to extract its schema and map to a sourcespecific XMODEL ontology. One may think of the XMODEL ontology as amodel whose TBOX is CXM, and is populated by the Schema informationextractable from the received structured data. It does not contain theactual data from the structured data, only the information modelcorresponding to the received structured data. FIG. 54 depicts oneembodiment of a source specific population of the XMODEL Ontology. Here,a Chart XML node is instantiated and mapped to its XML Expressions (itis modeled as an XML Attribute node and it expresses aUniqueIdentifierMetadata for another node in the same element.

The XMODEL may then be utilized by an Structured Data to RDF mappingalgorithm to create a graph representation of the received structureddata at step 4720. This graph representation may be an RDFrepresentation of the structured data based on the descriptions in theXMODEL and contains the actual instances of the data contained in thestructured data. In one embodiment once the schema of structured data isknown by XMODEL ontology, the incoming structured data may be consumedand turned into an Isomorphic RDF graph whose nodes are mapped to thenodes of the XMODEL ontology (another RDF graph) that formally describesthe information model of the structured data. This mapping creates aunified graph that may be used by future steps to associate any givendata node with its description in the XMODEL ontology and makeinferences about them. This graph may be isomorphic as its schema ismorphologically similar or identical to the original schema of thestructured data, that is it preserves the same kind of hierarchicalrelations within the RDF nodes (using the hasXmlChildNode property asobserved in the structured data).

This graph representation may be used by a TBOX modeler algorithm tocreate a TBOX integrative model at step 4730. This TBOX integrativemodel may be a graph representation of all concepts that may becontained in the data received from the data source and may be mapped toa core data model ontology (CDM) that is a high level ontology to createconcepts that other TBOX concepts can be derived or extended from. TheCDM plays the role of a upper ontology for all ontologies generated bythis algorithm and enables future integration of all ontologies (TBOX)constructed by this algorithm into a unified model.

In one embodiment, the unified graph resulting from step 4720 isnavigated and a new class for every single SchemaExpression andMetadataExpression in the unified graph is created inside the TBOX if itdoes not already exist. A corresponding property for each concept canalso be created if it does not exist. Most properties are extensions ofSKOS:broader and skos:narrower property to convey hierarchical relationsextractable from structured data. The hierarchical information from theIsomorphic RDF graph or structured data is lost in this model and asubstantially flat list of concept are generated in the model. Thehierarchical information are extracted into a complementary model calledS-Model (stands for SKOS model) that is designed to persist thehierarchical information in a model, without incorporating it forinferencing or querying inside the model.

FIG. 55 depicts one embodiment of a high level ontology that is used toextend the TBOX (the upper ontology). This may be a rather smallontology that grows bigger and bigger as new concepts are beingdiscovered and added to this ontology.

FIG. 56 depicts a snapshot of a one embodiment of a TBOX extracted froman isomorphic RDF graph. It may be noted that, in one embodiment, thehierarchical representation in the left pane of FIG. 56 may beconstructed using the information from S-Model and illustrates thehierarchical relationships between concepts according to the sourcedata. Such hierarchical information may not be incorporated into theTBOX directly, since it cannot be guaranteed that all hierarchies are oftype ‘inheritance’ (non-formal hierarchies). That is, one cannotguarantee that because the data is organized into a hierarchy in asource dataset, it means that child nodes always inherit properties ofthe parent node. In order to avoid mischaracterization of data duringinference and querying, the algorithm separates information abouthierarchic relations between concept in the TBOX and persists them inseparate module, and as a non-formal hierarchy (using extensions ofskos:broader or skos:narrower) which does imply a hierarchy(super-concept and sub-concept), but does not imply inheritance (e.g.,rdfs:subClassOf). Similarly hierarchy information can be extracted byparsing the values of the hasClassPath property for each concept in theTBOX that is added by the Tbox modeler algorithm to each and everyconcept to annotate the hierarchical location of any given concept asextractable from the original data but for human use.

An ABOX population algorithm may utilize the TBOX model and the graphrepresentation of the structured data received from the data source toconstruct a graph representation of the actual data (ABOX) received fromthe data source at step 4740, where the graph representation of theactual data (ABOX) received from the data source is mapped to the TBOXmodel. Such an algorithm may import the updated TBOX produced by step4730 and populates it with information extracted from the unified graphproduced by step 4720 (for example, isomorphic RDF graph). The ABOXjoins the hierarchical relations between the nodes of the receivedstructured data together, for example, using the properties that may beextensions of skos:broader or skos:narrower.

FIG. 57 depicts one embodiment of an portion of an ABOX. A node (rightpanel) is related to all other nodes extracted from the isomorphic graphand mapped to the TBOX (left panel). FIG. 58 depicts one embodiment of asnapshot of an XML message that can be converted to a TBOXrepresentation and an ABOX created using the structured data to ontologymethod as described above. As is apparent, most nodes without specificmeanings are completely filtered out and the remaining model isremarkably richer and more formal in the ontology generated from thismodel, without information loss.

It may be useful here to go into more detail with respect to each of thealgorithms depicted in FIG. 47. Moving then to FIG. 48, one embodimentof a method for an schema parser is depicted. This schema parser takesas input an structured data set from a data source and uses the coreSchema ontology to populate a source specific model (XMODEL). The schemaparser may traverse the schema of the received structured data at step4810. The nodes containing some data within structured data may beextracted at step 4820. For each of the nodes it can then be determinedif a node (for example, represented in RDF) already exists in the sourcespecific XMODEL to represent the Schema information for the node. Ifsuch a node exists, at step 4830 no action is taken and the next childnode is evaluated. However, if no such node exists, at step 4840 a nodein the source specific model may be created (for example, an RDF node)that uniquely describes any node in the structured data that may have asimilar position (Path) to the node in question.

The creation of such an RDF node may entail the application of a set ofheuristics at step 4850, where the application of the set of heuristicsmay comprise mapping the RDF node to SchemaExpression andMetadataExpression nodes in the core Schema ontology, mapping the RDFnode to annotation nodes in the core Schema ontology, mapping the RDFnode to data types based on the data type ontology, mapping the node tounique identifier nodes using the concepts of the core Schema ontology,the identification of standard coding schemes (for example, ICD9,SNOMEDCT, etc.), the annotation of the node with Path and other metadataand, if the structured data is formatted as an XML document, thecreation of the RDF node that represents the XML schema for that XMLnode.

In FIG. 49, one embodiment of a method for an structured data to RDFmapping is depicted. Embodiments of this method may be used to create anisomorphic RDF representation of structured data based on the XMODELcreated using above method. Beginning with the top most data element ofthe received structured data at step 4910, structured data can betraversed at step 4920, where the traversal of a node may comprisetraversing to each of the child nodes of that node. For each node in thereceived structured data, then, at step 4930 the node in the XMODEL thatrepresents the PATH (position) of that node may be located. A unique RDFnode to describe that specific node can then be created at step 4940.

This newly created RDF node can be mapped to the XMODEL RDF node thatdescribes the schema of the node at step 4950. At step 4960, hierarchyinformation that links the RDF node to the RDF nodes representing thatnode's siblings and patens in the structured data may be added to thenode along with other information about this node, including forexample, attribute or column name, attribute or column value, elementname (if the structured data is an XML document), etc. at step 4970.

Moving on to FIG. 50, one embodiment of a method for creating anontology for the data source is depicted. This ontology may be a formalmodel of the structure and type of data found in the data source(according to the received structured data) and may be referred to as aTBOX or TBOX model of the data source. This TBOX model may be created bya TBOX modeler algorithm using the RDF representation of the structureddata. Beginning with the node of the RDF from the XMODEL that representsthe topmost node of the structured data at step 5010, the RDF nodesrepresenting the attributes of the root element may be traversed at step5020, where the traversal of a node may comprise traversing to each ofthe RDF elements representing the child elements of that node.

For each of the RDF nodes of an attribute (including the RDF nodesassociated with the child elements), it can be determined at step 5030if a node with the same name already exists, where the node may be aclass in the TBOX model. If a node already exists in a system thesaurus,and has the same PATH (position) or schema as described in the XMODEL,the next RDF node associated with a child element (or if the attributehas no more child nodes, the next attribute node) may be obtained.However, if a corresponding class does not exists, it can be determinedat step 5040 if the RDF node is represented in the XMODEL is a SchemaExpression or a Metadata Expression. If the RDF node in the XMODEL is ametadata expression a TBOX concept with the RDF nodes name may becreated at step 5050. In an embodiment of the system an object propertynamed “has”+“ClassName” may be created and added to the TBOX. In anotherembodiment of the system an object property named “has”+“Parent nodeClassName” may be created and added to the TBOX. Then a node may beadded to the system thesaurus that comprises concepts alreadyrepresented at steps 5060 and 5070. Furthermore, if the RDF node is aType expression, the TBOX concept with the RDF node name may be made asubclass of the class representing the parent node of the nodecorresponding to the RDF node for which the TBOX concept was created atstep 5080.

Returning to step 5040 if the RDF node is a schema expression, a TBOXconcept corresponding to the node name may be created at step 5090.Additionally, an object property named “has”+“ClassName” or“has”+“Parent node ClassName” may be created and added to the TBOX, anda node may be added to the system thesaurus that comprises conceptsalready represented at steps 5060 and 5070. Furthermore, if the RDF noderepresents an attribute one or more TBOX concepts may be created for thevalues of the node at step 5092.

Additionally if any RDF node describing the node in XMODEL is mapped toa ConceptIdentifier class in the data type ontology a new class will beadded to the TBOX for each data value of the node in the structureddata, and system thesaurus will be updated. For example in an XMLdocument as <Data PatientID=“12345” Age=“20” Race=“Black”/>4 concepts(PatientID, Age, Race, Black) may be added to the TBOX if the Race nodeis modeled as MetaDataExpression and ConceptIdentifier at the sametime).

In one embodiment of the system, the values of ConceptIdentifier nodescan be forced to be instantiated as individuals instead of concepts inTBOX ontology through some heuristics (for example for all StandardsBased concepts) or through configuration by a human modeler. For examplein an XML document as <Data PatientID=“12345” Age=“20” Race=“Black”/>3concepts (PatientID, Age, Race) may be added to the TBOX if the Racenode is modeled as MetaDataExpression and ConceptIdentifier at the sametime and further mapped to the ForcedInstantiation concept by amodeler). An additional node representing ‘Black’ will be instantiatedas an individual of type ‘Race’ concept.

Returning to step 5030, if a node already exists in the systemthesaurus, but has a different PATH or position in the XMODEL, a classnamed “Super”+“ClassName” may be created if it does not already exist atstep 5032, and the new class can be made a subClassOf this newly createdsuperClass at step 5034. Following this the set of steps beginning withstep 5040 may be performed as described above.

Once the source ontology is created, this source ontology may be used toconstruct a graph representation of the actual data in the receivedstructured data based on the source ontology. This process may bereferred to as populating the ABOX (graph representation of the actualdata) based on the TBOX (source ontology). Thus, a graph is formedrepresenting the structured data, where the graph is unified with thesource ontology describing the structured data from which the data wasreceived.

In FIG. 51, one embodiment of a method for populating the ABOX with datacorresponding to the XML message using the TBOX model is depicted.Beginning with the node of the RDF from the XMODEL that represents theroot of the XML message at step 5110, the RDF nodes representing theattributes of the root element may be traversed, where the traversal ofa node may comprise traversing to each of the RDF elements representingthe child elements of that node.

For each of the RDF nodes of an attribute (including the RDF nodesassociated with the child elements), the TBOX concept (class)representing that node (as created above) may be found at step 5120. Anexample of that class may be <owl:Class: ID=#Age>. Once the class isfound an individual instance of that class may be created and assigned aunique URI at step 5130 (for example, <AGE ID=AGE_(—)1>. Next, at step5140 the object Property that has the name “has”+“Class” in the TBOXwill be obtained. (for example, hasAge). At step 5150 the individualdata element associated with the parent node of the RDF node beingprocessed (for example, the parent node of the node that is associatedwith the RDF node being processed, for example, <Person ID=Person_(—)1>)may be found. The child instance node can be linked to the parentinstance node through insertion of the following statement in the ABOX:<parent instance> <hasProperty (has+ClassName)> <child Instance node>.for example

<Person_(—)1> <hasAge> <Age_(—)1><Person_(—)1> <rdf:type> <Person><Age_(—)1> <rdf:type> <Age> at step 5160.

Returning to step 5130 if the RDF node in the XMODEL has a literal valueassociated with it (for example <Data Age=“20”/>), an rdf:Resourcecorresponding to the value can be created and linked to the newlycreated RDF node (for example, <RDF:Descriptionrdf:about=#Value_(—)1”<Value_(—)1><hasLiteralValue>“25”̂̂xsd:integer) atstep 5162. The individual data element may be linked to the RDF noderepresenting the literal value (for example, <Age_(—)1> <hasValue><Value_(—)1> at step 5170. Additionally, if the value is auniqueIdentifier the value can be used as part of the URI for the newlycreated node (for example, ClassNAme+MD5(value) at step 5180.

As discussed herein, embodiments of the informatics system presented mayutilize a domain ontology. In one embodiment, the domain ontology may beUMLS-SKOS, an OWL ontology that partially but consistently adopts theUMLS-SN for the Semantic Web applications. The UMLS-SKOS domain ontologymaps each UMLS Semantic Type into a corresponding owl:Class and eachUMLS Semantic Relationship into an owl:ObjectProperty. Concepts andProperties in this model have rdfs:subClassOf and rdfs:subPropertyOfrelationships when there is an ‘is a’ relationship in the UMLS-KS. Inthe UMLS-SKOS domain ontology, each UMLS-MTH concept represents aresource with a unique resource identifier (URI) constructed using aNameSpace:CUI schema, where NameSpace can represent any unique URL suchas ‘umls=http://nih.nlm.gov/umls/’. All UMLS-MTH concepts areconceptualized to be instances of (rdf:type) the concept representingits associated Semantic Type. The semantics of each UMLS-SKOS resource(each UMLS-MTH concept) is defined by its source and through variety ofmeans: by a textual definition or annotation; by its Semantic Type andits place in the hierarchy; by source defined relationships betweenconcepts, or by terminological relationships between terms (hyponymy,hypernymy, synonymy, etc.) defined by the UMLS-MTH. There are majorgroupings of Semantic Types incorporated in the UMLS-SN and therefore inthe UMLS-SKOS for organisms, anatomical structures, biologic functions,chemicals, events, physical objects, and concepts or ideas.

One embodiment of a method for the construction of such a UMLS-SKOSdomain ontology from UMLS is depicted in FIG. 36. At step 3610 theUMLS-Semantic Network (UMLS-SN) is converted to a Simple KnowledgeOrganization System (SKOS) representation. The UMLS-Metathesarus (MTH)model is then converted to SKOS at step 3620. This allows unification ofany formal graph within the informatics system with the knowledge fromUMLS that can in turn augment mining, interpretation and integration ofmultisource information. The metathesarus portion of the ontology ispopulated with CUIs at step 3630. The source vocabularies of the UMLSontology being created are then populated and mapped to the metathesarusmodel at step 3640. This method may be utilized for example, toconstruct a UMLS-SKOS domain ontology and provide this UMLS-SKOS domainontology to an informatics system for use as a domain ontology asdiscussed above.

To construct the UMLS-SKOS domain ontology at step 3610 the UMLS-SN isfirst converted to SKOS representations. SKOS and SKOS-XL are firstobtained for use. Next, the semantic types are set in the ontology bycreating a single ontology concept (for example, owl:Class in theSemantic Web framework for knowledge representation) for each SemanticType in UMLS. Semantic types (STY) may be created by querying theSemantic Network (SN) and adding a single class per each semantic typeretrieved. These STY may be defined by adding all properties of eachontology class created based on the UMLS Semantic Network

These classes can then be formed into SKOS by further defining everyontology class as a SKOS:Concept. Relationships are then created byquerying the UMLS Semantic Network for all semantic relations andcreating one property in the ontology for each semantic relationretrieved. These relationships are defined by adding a singleObjectProperty for each semantic relation in the UMLS Semantic Network.These relationships (REL) are then mapped to SKOS by making the SemanticNetwork properties subProperties of an appropriate SKOS:Relation.

Hierarchies can then be set in the UMLS SKOS ontology. UMLS SemanticTypes and UMLS Semantic Relations have defined hierarchies. Thishierarchic information can be retrieved from UMLS and added into theUMLS-SKOS ontology being created. An STY Hierarchy can then be createdin the ontology by retrieving hierarchic information from UMLS andadding them into the UMLS-SKOS classes (for example, semantic types)created earlier. A REL hierarchy is built by retrieving hierarchicinformation from UMLS and adding them into the UMLS-SKOS properties (forexample, properties).

Semantic relations are then set in the ontology. UMLS Semantic Typeshave defined relationships through the UMLS semantic relations. Thoserelations between classes (for example, semantic types) can be retrievedfrom UMLS and added into the UMLS-SKOS ontology being created. Thus, atriple whose subject and object are semantic types that are relatedthrough a semantic relation (STY REL STY) can be created.

FIG. 37 depicts a representation of one embodiment of Semantic Typesconverted to an ontology with their hierarchies preserved (left panel ofthe depicted interface). All concepts are fully defined by propertiesand relations extracted from UMLS (middle panel of the depictedinterface). All semantic properties are extracted and mapped to anobject property, along with their mappings to SKOS properties, andsubProperty hierarchy.

The UMLS-Metathesarus (MTH) is then created in the ontology at step3620. UMLS-SN may be accessed and the UMLS version set by obtaining fromUMLS the current version of the UMLS being converted. This informationmay be added to every (or some subset) concept extracted to mark thedate and the version of the converted Metathesaurus.

The ConceptScheme is set by obtaining from the current version of theUMLS all source vocabularies (SAB) incorporated, and their currentversion. These can then be mapped as skos:ConceptScheme concepts to theontology being created. For each concept schema all root concepts thatmay be used to navigate the vocabulary can be found and added to theontology being created as the skos:topConcept.

The SAB of the ontology may then be populated by querying the UMLS forall SAB and their metadata, including version and populating the SAB ofthe ontology based on the response. The root concept of each SAB can beset by querying for the topmost (root) concept from UMLS for each SABand linked to the SAB using umls:rootCUI. The SABs can then be mapped toSKOS by adding each SAB as an instance of skos:ConceptScheme to theUMLS-SKOS ontology being created. Metadata can then be added and theTopConcept link added as retrieved. FIG. 38 depicts one embodiment of anexample SAB class (subclass of skos:ConceptScheme), and its instancesand source vocabularies incorporated in the UMLS. In this exampleSNOMEDCT is shown with its metadata and rootCUI showing its top mostconcept.

The UMLS-MTH Relations can then be set in the ontology by querying UMLSto obtain all distinct relations (REL, and RELA) and create theirsubProperty relationships according to the UMLS. These may be added tothe ontology as owl:ObjectProperty. The labels (STR, AUI, SUI) in theontology being constructed may then be set by creating owl:Classes andproperties to represent STR, AUI and SUI according to their definitionin UMLS. The TermTypes (TTY) for the ontology being created aresimilarly set by querying the UMLS for all TermTypes from mrDOC and addthem as owl:AnnotationProperty to the ontology. TermTypes are used forlinking STR to CUI as extractable from mrConso table. These types canthen be mapped to SKOS. More specifically, for each UMLS TermType find acorresponding skos:Label that best presents that label type. This mayentail a mapping process comparing the definition of the term types inUMLS and finding the best match in SKOS.

Relation hierarchies are then set in the ontology by, for each UMLSrelation, finding a corresponding super property and adding that asowl:subPropertyOf. This may entail a mapping process comparing thedefinition of the term types in UMLS and finding the best match.Symmetric relations are then set. If a property has an inverse relationwith itself, that property is made symmetric. This is done by queryingthe mrDoc and mrRel tables for evidence of properties being in symmetricrelations with each other through the same property. FIG. 39 depicts oneembodiment of example properties extracted from UMLS Metathesaurus andpresented with their full hierarchic relations and mappings to SKOS.

The attribute model of the ontology being created can then be set. Thismay be accomplished by querying the attributes table in UMLS to createone single annotation property for each distinct attribute type in UMLSand adding that distinct attribute type as subproperty of umls:attributeproperty. FIG. 40 depicts one embodiment of STR, AUI, SUI classes (leftpane), and corresponding properties (middle and right pane). TheTermtypes (subProperties of STR) and other relations are alsodemonstrated.

At step 3630 the Metathesarus portion of the ontology being created maybe populated with CUIs. The CUIs of the ontology may be populated by,for each CUI creating a single skos:Concept and adding all informationinto it using the properties created and added to the ontologypreviously (in the Semantic Network portion of the ontology beingcreated) and make them rdf:type of the Semantic Type classes created inthe previous steps (for example, in the Semantic Network model). In oneembodiment of the system, Labels are set in the ontology being createdby querying the mrconso table and add all the STRs using the termTypesextracted. Compare each term with the UMLS designated preferred labels,to distinguish between the skos:prefLabel and skos:altLabels that areused to designate labels. SUIs are then added. For each term extractedan instance of the SUI class can be created using the skos-xl skos:Labelclass and attach to the CUI concept. That is each CUI object will havetwo distinct ways of representing terms, using literals (usingskos:prefLable and skos:altLabel) and objects (using skos-xl:prefLableand skos-xl:altLabel). Synonymy can then be added by making all terms ofa CUI mutually synonyms using the umls:synonymous and adding them to themodel. The umls:synonymous is a transitive property.

Definitions can then be set by querying mrDef for all definitions of aCUI and add that to the ontology using skos:definition property.Semantic Types are set by querying mrSTY for the semantic types of a CUIand make each CUI object an rdf:type of the corresponding Semantic Typeclass in the ontology. Relations are set by querying the REL table forall REL and RELA relationships of a CUI with other CUIs and use objectproperties extracted in the previous steps to link them in the ontologybeing created. The MTH attributes are then set in the ontology byquerying the attributes table in the UMLS and adding values of allattributes associated with a CUI using the attributes propertiesextracted previously. FIG. 41 depicts a representation of a single CUIand its associated properties. FIG. 42 depicts a graph representation ofthe concept depicted in FIG. 41.

At step 3640 source vocabularies of the UMLS-SKOS ontology may bepopulated and mapped to the metathesauraus portion of the ontology beingcreated. The UMLS-MTH may be accessed. The concepts of the ontology maybe set by, for each concept or term in source vocabulary (SAB), creatinga distinct skos:Concept associated with the ConceptScheme representingthat source vocabulary. The concept can then be associated with itsdefinitions, terms, and relations and linked to the CUIs that itcorresponds to by querying the UMLS. Unique semantic identifiers (SUIs)may then be set. Each term or concept in a terminology system has atleast one form of a unique identifier. Find and use that to form a URIfor the concept using the following method: UMLSNameSPACE+/+SABName+/+Unique Identifier. The labels for the concepts can then be set inthe ontology being created by querying the mrconso table to identifyterms specifically contributed by the SAB to that concept and add allthe STRs using the termTypes extracted previously. Each term can becompared with the UMLS designated preferred labels to distinguishbetween the skos:prefLabel and skos:altLabels that are used to designatelabels. For each term extracted also create an instance of AUI classusing the skos-xl skos:Label class and attach to the SAB concept. Thatis each SAB object will have two distinct ways of representing terms,using literals (using skos:prefLable and skos:altLabel) and AUI objects(using skos-xl:prefLable and skos-xl:altLabel). Synonymy may be added bymaking all terms of a CUI mutually synonyms when adding them to themodel using Umls:synonymous (which is transitive). Definitions can thenbe set for the concepts by querying mrDef for all definitions of the CUIassociated with this object by adding those definitions that use theskos:definition property.

Once the concept portion of the ontology being created is set, therelations can be set in the ontology by querying the mrRel table for allREL and RELA relationships of the unique identifier associated with theSAB object (AUI, SCUI, or CODE) with other unique identifiers and useobject properties extracted in the previous steps to link them in theontology.

The metathesarus attributes can then be set in the ontology beingcreated by querying the attributes table in the UMLS and adding valuesof all attributes associated with a the SAB using the attributeproperties extracted previously. The concepts can then be mapped to aCUI. This can be accomplished by querying mrConso for mapping betweenCUI and the SAB unique identifier and representing it using an instanceof umls:MapSet class. FIG. 43 depicts a representation of SABs, theirlabels and relations with each other. FIG. 44 depicts the SABs of FIG.43 in an ontology editor. FIG. 45 depicts a representation of a graphfor a portion of a domain ontology, where the domain ontology comprisesa mapped and cross correlated vocabulary system that emerges out ofoverlying multiple distinct graphs utilized in the above method.

In the foregoing specification, the invention has been described withreference to specific embodiments. However, one of ordinary skill in theart appreciates that various modifications and changes can be madewithout departing from the scope of the invention as set forth in theclaims below. Accordingly, the specification and figures are to beregarded in an illustrative rather than a restrictive sense, and allsuch modifications are intended to be included within the scope ofinvention.

Benefits, other advantages, and solutions to problems have beendescribed above with regard to specific embodiments. However, thebenefits, advantages, solutions to problems, and any component(s) thatmay cause any benefit, advantage, or solution to occur or become morepronounced are not to be construed as a critical, required, or essentialfeature or component of any or all the claims.

What is claimed is:
 1. A system for data mining, comprising: aninformatics system, comprising a processor and a non-transitory computerreadable medium comprising instructions for: receiving an input from oneor more data sources; translating data of the input to a graphrepresentation of the input based on a graph representation of a sourceontology; obtaining a graph representation of a domain ontology, whereinthe domain ontology comprises a set of concepts and a set ofrelationships; mapping the graph representation of the input to thegraph representation of the domain ontology to create a unified graphcomprising the graph representation of the input and the graphrepresentation of the domain ontology; providing the ability toconstruct a query based on at least one of the set of concepts or atleast one of the set of relationships of the domain ontology; andsearching the unified graph based on the query to obtain data of theinput associated with at the at least one of the set of concepts or theat least one relationships on which the query is based.
 2. The system ofclaim 1, wherein the domain ontology includes the unified medicallanguage system (UMLS) or GALEN.
 3. The system of claim 2, wherein thedomain ontology is represented in Simple Knowledge Organization Systemrepresentation (SKOS).
 4. The system of claim 1, wherein the input issurvey response.
 5. A survey system, comprising: an informatics system,comprising a processor and coupled to one or more data sources, theinformatics system comprising a processor and a non-transitory computerreadable medium comprising instructions for: creating a survey based ona survey ontology and a domain ontology, wherein the survey is a graphrepresentation of a set of questions and a set of answers and thecreation of the survey creates a unified graph between the survey, agraph representation of a survey ontology and a graph representation ofdomain ontology distributing the survey to one or more data sources;receiving a survey response from the one or more data sources; creatinga graph representation of the survey response; and adding the graphrepresentation of the survey response to the unified graph such that theunified graph includes the graph representation of the survey response.6. The system of claim 5, wherein the survey is rendered forpresentation to a user on the one or more data sources.
 7. The system ofclaim 6, wherein the survey is rendered at the one or more data sourcesbased on the survey ontology.
 8. The system of claim 5, wherein thecomputer readable medium further comprises instructions for: providingthe ability to construct a query based on at least one of a set ofconcepts; and searching the unified graph based on the query to obtaindata of the survey response associated with at the at least one of theset of concepts on which the query is based.
 9. A method for datamining, comprising: receiving an input from one or more data sources;translating data of the input to a graph representation of the inputbased on a graph representation of a source ontology; obtaining a graphrepresentation of a domain ontology, wherein the domain ontologycomprises a set of concepts and a set of relationships; mapping thegraph representation of the input to the graph representation of thedomain ontology to create a unified graph comprising the graphrepresentation of the input and the graph representation of the domainontology; providing the ability to construct a query based on at leastone of the set of concepts or at least one of the set of relationshipsof the domain ontology; and searching the unified graph based on thequery to obtain data of the input associated with at the at least one ofthe set of concepts or the at least one relationships on which the queryis based.
 10. The method of claim 9, wherein the domain ontologyincludes the unified medical language system (UMLS) or GALEN.
 11. Themethod of claim 10, wherein the domain ontology is represented in SimpleKnowledge Organization System representation (SKOS).
 12. The method ofclaim 9, wherein the input is survey response.
 13. A method forsurveying, comprising: creating a survey based on a survey ontology anda domain ontology, wherein the survey is a graph representation of a setof questions and a set of answers and the creation of the survey createsa unified graph between the survey, a graph representation of a surveyontology and a graph representation of domain ontology distributing thesurvey to one or more data sources; receiving a survey response from theone or more data sources; creating a graph representation of the surveyresponse; and adding the graph representation of the survey response tothe unified graph such that the unified graph includes the graphrepresentation of the survey response.
 14. The method of claim 13,wherein the survey is rendered for presentation to a user on the one ormore data sources.
 15. The method of claim 14, wherein the survey isrendered at the one or more data sources based on the survey ontology.16. The method of claim 13, further comprising: providing the ability toconstruct a query based on at least one of a set of concepts; andsearching the unified graph based on the query to obtain data of thesurvey response associated with at the at least one of the set ofconcepts on which the query is based.
 17. A non-transitory computerreadable medium, comprising instruction for: receiving an input from oneor more data sources; translating data of the input to a graphrepresentation of the input based on a graph representation of a sourceontology; obtaining a graph representation of a domain ontology, whereinthe domain ontology comprises a set of concepts and a set ofrelationships; mapping the graph representation of the input to thegraph representation of the domain ontology to create a unified graphcomprising the graph representation of the input and the graphrepresentation of the domain ontology; providing the ability toconstruct a query based on at least one of the set of concepts or atleast one of the set of relationships of the domain ontology; andsearching the unified graph based on the query to obtain data of theinput associated with at the at least one of the set of concepts or theat least one relationships on which the query is based.
 18. The computerreadable medium of claim 17, wherein the domain ontology includes theunified medical language system (UMLS) or GALEN.
 19. The computerreadable medium of claim 18, wherein the domain ontology is representedin Simple Knowledge Organization System representation (SKOS).
 20. Thecomputer readable medium of claim 17, wherein the input is surveyresponse.
 21. A non-transitory computer readable medium, comprisinginstruction for: creating a survey based on a survey ontology and adomain ontology, wherein the survey is a graph representation of a setof questions and a set of answers and the creation of the survey createsa unified graph between the survey, a graph representation of a surveyontology and a graph representation of domain ontology distributing thesurvey to one or more data sources; receiving a survey response from theone or more data sources; creating a graph representation of the surveyresponse; and adding the graph representation of the survey response tothe unified graph such that the unified graph includes the graphrepresentation of the survey response.
 22. The method of claim 21,wherein the survey is rendered for presentation to a user on the one ormore data sources.
 23. The computer readable medium of claim 22, whereinthe survey is rendered at the one or more data sources based on thesurvey ontology.
 24. The computer readable medium of claim 21, furthercomprising instructions for: providing the ability to construct a querybased on at least one of a set of concepts; and searching the unifiedgraph based on the query to obtain data of the survey responseassociated with at the at least one of the set of concepts on which thequery is based.